当前位置:网站首页>[small sample segmentation] msanet: multi similarity and attention guidance for boosting few shot segmentation
[small sample segmentation] msanet: multi similarity and attention guidance for boosting few shot segmentation
2022-07-27 04:18:00 【Chestnut vegetable】

The article links :MASNet
Code link :MSANet-code
Abstract
The purpose of small sample segmentation is to segment invisible class objects when there are only a few densely labeled samples . Prototype learning , That is, the features extracted from the supporting image generate a single or multiple prototypes by averaging the global and local object information , It has been widely used in FSS. However , Only using prototype vectors may not be enough to represent the features of all supporting images . In order to extract rich features and make more accurate prediction , We propose a multi similarity and attention network (MSANet), Includes two new modules , A multi similarity module and an attention module . The multi similarity module uses multiple feature maps of supporting images and query images to estimate accurate semantic relationships . Attention module indication MSANet Focus on category related information .
Introduce
With the maturity of large-scale data sets [9,10,13,26] Development of , A series of supervised Convolutional Neural Networks (CNN) In semantic segmentation task [1,34,40,41,49] Shows great potential . The performance of these supervised neural networks depends largely on the quality and quantity of training data sets , For example, the number of well annotated data 、 Balance of class distribution and sample representation . However , in application , It's hard to get a lot of annotated data , Especially in intensive prediction tasks [2,3,14,21,57,59]. Besides , Traditional supervised neural networks may be difficult to generalize images with invisible classes . Inspired by the cognitive ability that humans use only a small amount of input data to distinguish objects , Developed small sample learning (FSL) technology [8,42,53,56]. This technology builds a network , It can be extended to have few available annotation samples . Small sample segmentation (FSS)[27-29,31,32,35,36,45,46,48,51,54,58,60,61,63,64] It is one of the applications of small sample learning , Especially focus on semantic segmentation .FSS The goal of is to segment the target area of the selected category in the query image using the corresponding annotation mask .FSS The most popular method is measurement based prototype learning [51]. Refer to the figure 1 The top half of ), Average pool through shielding (MAP) Generate single or multiple class representative prototype vectors [67]. The feature processing network uses the class representative prototype vector to segment the target object in the query image . Many researchers try to get more guidance from the prototype vector using different mechanisms , for example ,PANet[55]、PFENet[51]、SG One Net[67]、CANet[65]、ASGNet[22]. However , Average pool operation due to shielding , Such prototype networks may lose detailed spatial information of images . under these circumstances , We propose a multi similarity and attention network composed of two guidance modules (MSANet). Refer to the figure 1 The bottom half , The network includes multi-layer similarity module and attention module . It is expected that these two modules will support the prototype learning paradigm , And guide MSANet Fine segmentation . Recent research shows that , You can use the visual correspondence between the supporting image and the query image [12] To upgrade FSS The Internet . In order to establish a more meaningful correspondence , Dense middle layer [33,37,38] And related tensor learning [24,43,52] technology .Juhong Minet They designed HSNet[36], A new method based on 4D Tensor supercorrelation squeezing network with multilayer dense feature correlation . Besides , We propose a multi similarity module , This module extracts multi-layer feature correlation from the backbone network , And apply a simple convolution block to the feature . We also propose a lightweight CNN Attention block , Target class content for paying more attention to images . follow BAM[20] The architecture of , We use a basic learner and integration module to refine the segmentation results . We will be right FSS The main contributions of the challenge are summarized below :
A multi-layer similarity module is proposed , In order to obtain the information visual correspondence between the supporting image and the query image .
We propose a simple but effective attention module , Use the supporting image and its corresponding mask to better understand the class related information .

Related work
Semantic segmentation
Semantic segmentation is one of the computer vision tasks to classify each pixel in a given image within a specified category . Thanks to the full convolution network ( FCNs ) Progress , Many model structures such as encoder based - Decoder UNet 、 Based on pyramid pool module ( PPM ) Of PSPNet And based on Atrous Spatial Pyramid pool ( ASPP ) Of Deeplab Etc. are proposed to improve segmentation performance . Besides , A series of vision technologies are also proposed , Including extended convolution 、 Multi level feature aggregation and attention mechanism . However , The traditional segmentation model needs enough annotation data , It is difficult to predict invisible categories without fine-tuning , Thus, it hinders the practical application to a certain extent .


Small sample learning
To solve these problems , Introduced FSL, The purpose is to understand invisible categories with a small number of annotated samples .FSL The method can be further subdivided into three branches :( i ) Optimized based [ 11、 19、 42 ]、( ii ) Based on enhanced [ 6,7 ] and ( iii ) Measurement based [ 23,48,50 ]. Based on the optimization method, a gradient update strategy is proposed , To overcome data bias , Improve the generalization ability of the model . The method based on enhancement solves the shortage of data by generating synthetic training images . Our work is closely related to measurement based methods , These methods aim to learn a general measurement function to calculate the distance between the query image and the supporting image . These measurement based methods have made outstanding progress . As one of them , The matching network uses a special small batch called set to match the training and testing environment . Relational networks transform query and support images into 1x1 The vector is then based on cosine similarity ( CS ) To classify . Besides , A prototype network is also proposed , It directly utilizes the feature representation calculated by the global average pool operation ( Prototype )
Semantic segmentation of small samples
:Shaban And so on OSLSM, As FSS One of the pioneering work of , Used to generate classifier weights for query image segmentation . The first branch takes the support image as the input , Generate a parameter vector , The second branch combines these parameters with the query image , Split mask as output . And then , In order to better extract information from supporting images and query images , Prototype learning paradigm is introduced .SG-One The masking average pool operation is introduced for computing class representative prototype vectors , Generate a spatial similarity graph .CANet Two dense comparison networks with iterative refinement modules are proposed .PFENet The cosine similarity of high-level characteristics is calculated without trainable parameters (CS), To create a priori mask and input it, a feature rich module is also introduced .ASGNet No prototype extension , Instead, a super pixel guided clustering method is proposed , Extract multiple prototypes from supporting images , And use the allocation strategy to reconstruct the support characteristic graph . However , Most prototype learning methods will lead to the loss of spatial structure . In order to fully mine the characteristics of foreground objects , There is still room for improvement in using classes to represent prototype vectors . On the other hand , stay FSS It is found that the visual correspondence and processing correlation tensor show significant results [ 36 - 38 ].HSNet Trained to compress dense feature correlation tensors , It is transformed into a segmentation mask by high-dimensional convolution . However , High dimensional convolution ( 4D Convolution ) It has high space complexity and time complexity . To extract lightweight CNN features ,DENet A guided attention module is introduced to estimate the weight of the new classifier inspired by tradition . The literature 17 This paper proposes an attention based multi context guidance network , Integrate small to large-scale context information , Guide the query branch globally .BAM There is no feature extraction or visual correspondence , But for FSS Introduced a new way , It uses additional blocks of the supervised model for base class training . The supervision model predicts the base class from the query image , Help meta learners suppress false predictions . Inspired by the recent research progress of visual correspondence and attention mechanism , We propose a multi-layer similarity module and a lightweight attention module in the context of the prototype network , take FSS Network to the next level .




Problem definition

Method

Multiple similarity




Attention module




experiment







Conclusion

边栏推荐
- Apachecon Asia preheating live broadcast incubator theme full review
- 二叉树的坡度
- Cool Lehman VR panorama paves the way for you to start a business
- Development of NFT digital collection system: Xiaoyi digital intelligence helps brands launch NFT with one click on the chain
- 2022年危险化学品经营单位主要负责人复训题库及答案
- 大咖说·图书分享|精益产品开发:原则、方法与实施
- 商业打假系列之第一百之--无聊的制度和管理流程真的可以扔进垃圾桶-顺便分析十几个无用的Unity游戏自检项目
- ASP语音通知接口对接demo
- Restful fast request 2022.2.2 release, supporting batch export of documents
- 科目三: 济南章丘二号线
猜你喜欢

Framework learning journey: init process startup process

大咖说·图书分享|精益产品开发:原则、方法与实施

MySQL: understand the basic knowledge of MySQL and computer

Ant JD Sina 10 architects 424 page masterpiece in-depth distributed cache from principle to practice pdf

Worship the 321 page PDF of the core technology of Internet entrepreneurship that Alibaba is pushing internally. It's really kneeling

Learning route from junior programmer to architect + complete version of supporting learning resources

"Gonna be right" digital collection is now on sale! Feel the spiritual resonance of artists
![[Code] sword finger offer 04 search in two-dimensional array](/img/7d/a6693bfd24af9d9587539dda458d27.png)
[Code] sword finger offer 04 search in two-dimensional array

spark练习案例(升级版)

Lixia action | Yuanqi Digitalization: existing mode or open source innovation?
随机推荐
面试题 16.05. 阶乘尾数
「Gonna Be Alright 会好的」数藏现已开售!感受艺术家的心灵共鸣
Apachecon Asia preheating live broadcast incubator theme full review
Manually build ABP framework from 0 -abp official complete solution and manually build simplified solution practice
Want to get the Apache official domain name mailbox? Exclusive interview with Apache linkis five new committers to tell you how to do it
Ribbon负载均衡策略与配置、Ribbon的懒加载和饥饿加载
【比赛参考】PyTorch常用代码段以及操作合集
科目三: 济南章丘五号线
A. Round Down the Price
ffmpeg合并视频功能
Which securities company has the lowest handling charge? Is it safe to open an account on your mobile phone
科目三: 济南章丘二号线
Five basic data structures of redis
What is the principle difference between lateinit and lazy in kotlin
VR panorama gold rush "careful machine" (Part 1)
B. ICPC Balloons
scala 不可变Map 、 可变Map 、Map转换为其他数据类型
Restful Fast Request 2022.2.2发布,支持批量导出文档
Maximum nesting depth of parentheses
MySQL: understand the basic knowledge of MySQL and computer