当前位置:网站首页>Double contextual relationship network for polyp segmentation
Double contextual relationship network for polyp segmentation
2022-06-28 19:20:00 【sigmoidAndRELU】
Colonoscopy image segmentation paper reading
- The overall structure of the thesis
The overall structure of the thesis
Title of thesis : A dual context network for polyp segmentation (ISBI2022)
Author's unit : Beijing University of Posts and telecommunications
Author's name : Yinzijin et al
Code address : https://github.com/PRIS-CV/DCRNet/blob/master/lib/DCRNet.py
Abstract
Automatic segmentation of polyps in colonoscopy in colorectal cancer (CRC) It plays a key role in the early diagnosis of . However , The diversity of polyp images greatly increases the difficulty of accurate segmentation . The existing research mainly focuses on learning Context information in a single image , But failed to take advantage of Synchronous visual pattern of polyps across images . This article explores context dependency from the overall perspective of the entire data set , A duplex context network is proposed (DCRNet) To capture Within and between images Of Context . Based on the above two similarities , The features of each input region can be enhanced by embedding the context region . In order to store the feature area embedded in the previous image in the training process , Episodic memory is designed and operated as a queue . We are EndoScene、Kvasir-SEG And recently released large-scale PICCOLO The proposed method is evaluated on the dataset . Experimental results show that , What we proposed DCRNet It is superior to the most advanced methods in terms of widely used evaluation indicators .
contribution :
1、 Propose to embed the context area ;
2、 Episodic memory is designed and operated as a queue ;
3、 Put forward DCRNet;
4、 The model performs well on multiple colon cancer datasets .
introduction
Diagnosis and treatment of colon cancer , Regional analysis of polyps is a key step , Polypectomy is a direct method to prevent and treat early colorectal cancer . The colonoscopy image can clearly show the information of the whole patient's colon , However, there are still some difficulties in the localization and segmentation of polyps :1、 Polyps are various ;2、 The boundary between polyp and colonic mucosa is too vague . As shown in the figure :
From the image we can observe , Some are obvious , image a b, The swollen part is , and d It's very exaggerated ,c It's not obvious , You can't see it without looking carefully .
Related work
In the existing work , Here is a brief introduction :
1、 Multi-scale feature extraction network :ACSNet(MICCAI 2020), Combining context information and local details to deal with the problem of polyp feature diversity .
PraNet Using multi-scale feature aggregation , The contour map is extracted according to local features and the segmentation map is refined by up sampling .
2、 Use auxiliary information to constrain the segmentation results :SFANet(MICCAI 2019), Using region boundary constraints , To select feature aggregation , Improve segmentation accuracy .
a key : These jobs , forehead , It seems that we are all looking for feature segmentation on a single image , In this case, it is not related to a recessive lesion similarity , Then select the corresponding segmentation parameters ?? If so , What a model can do is to segment the obvious lesions , For different types of polyp images, the corresponding invisible classification , A simple image is simply divided , Complex images and inconspicuous images are special methods , A lot of sense !
So this article will mention a mechanism , It's called episodic memory !
Theoretical proof :(Content-based medical image retrieval of ct images of
liver lesions using manifold learning) The significance of retrieving from other images in the treatment of radiological lesions has been demonstrated .
Related achievements : It has been used in measurement learning .
therefore , This paper adopts this idea , From the whole point of view of the whole data set, this paper discusses the cross image and the feature association in the image .
Job summary :
1、 Intra image context module
2、 Context relation module outside the image
These two modules are also plug and play .
Model structure
First picture

First, see the network framework diagram , It consists of three parts , Encoder 、 decoder 、 Bottom information processing module .
Codec used in this paper is based on ResNet34 Of UNet, No more details here . Watch the main play directly !
Internal context
class PAM_Module(Module):
""" Position attention module"""
#Ref from SAGAN
def __init__(self, in_dim):
super(PAM_Module, self).__init__()
self.chanel_in = in_dim
self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.gamma = Parameter(torch.zeros(1))
self.softmax = Softmax(dim=-1)
def forward(self, x):
""" inputs : x : input feature maps( B X C X H X W) returns : out : attention value + input feature attention: B X (HxW) X (HxW) """
m_batchsize, C, height, width = x.size()
proj_query = self.query_conv(x).view(m_batchsize, -1, width*height).permute(0, 2, 1)
proj_key = self.key_conv(x).view(m_batchsize, -1, width*height)
energy = torch.bmm(proj_query, proj_key)
attention = self.softmax(energy)
proj_value = self.value_conv(x).view(m_batchsize, -1, width*height)
out = torch.bmm(proj_value, attention.permute(0, 2, 1))
out = out.view(m_batchsize, C, height, width)
out = self.gamma*out + x
return out
This code , The notes written by the author are very detailed , This function is to establish the relationship between all pixels in the current image , Then multiply this relationship by the input , So as to obtain the weighted effect ! Of course , The residual structure is always a reserved item , Um. , That's it .
External context ( This is the first time in my life , It is worth observing )
class DCRNet(ResNet34Unet):
def __init__(self,
bank_size=20,
num_classes=1,
num_channels=3,
is_deconv=False,
decoder_kernel_size=3,
pretrained=True,
feat_channels=512
):
super().__init__(num_classes=1,
num_channels=3,
is_deconv=False,
decoder_kernel_size=3,
pretrained=True)
self.bank_size = bank_size
self.register_buffer("bank_ptr", torch.zeros(1, dtype=torch.long)) # memory bank pointer
self.register_buffer("bank", torch.zeros(self.bank_size, feat_channels, num_classes)) # memory bank
self.bank_full = False
# =====Attentive Cross Image Interaction==== #
self.feat_channels = feat_channels
self.L = nn.Conv2d(feat_channels, num_classes, 1)
self.X = conv2d(feat_channels, 512, 3)
self.phi = conv1d(512, 256)
self.psi = conv1d(512, 256)
self.delta = conv1d(512, 256)
self.rho = conv1d(256, 512)
self.g = conv2d(512 + 512, 512, 1)
# =========Dual Attention========== #
self.sa_head = PAM_Module(feat_channels)
#=========Attention Fusion=========#
self.fusion = nn.Conv2d(feat_channels, feat_channels, 1)
#==Initiate the pointer of bank buffer==#
def init(self):
self.bank_ptr[0] = 0
self.bank_full = False
@torch.no_grad() # This is very important !!!!
def update_bank(self, x):
ptr = int(self.bank_ptr)
batch_size = x.shape[0]
vacancy = self.bank_size - ptr
if batch_size >= vacancy:
self.bank_full = True
pos = min(batch_size, vacancy)
self.bank[ptr:ptr+pos] = x[0:pos].clone()
# update pointer
ptr = (ptr + pos) % self.bank_size
self.bank_ptr[0] = ptr
def down(self, x):
e1 = self.encoder1(x)
e2 = self.encoder2(e1)
e3 = self.encoder3(e2)
e4 = self.encoder4(e3)
return e4, e3, e2, e1
def up(self, feat, e3, e2, e1, x):
center = self.center(feat)
d4 = self.decoder4(torch.cat([center, e3], 1))
d3 = self.decoder3(torch.cat([d4, e2], 1))
d2 = self.decoder2(torch.cat([d3, e1], 1))
d1 = self.decoder1(torch.cat([d2, x], 1))
f1 = self.finalconv1(d1)
f2 = self.finalconv2(d2)
f3 = self.finalconv3(d3)
f4 = self.finalconv4(d4)
f4 = F.interpolate(f4, scale_factor=8, mode='bilinear', align_corners=True)
f3 = F.interpolate(f3, scale_factor=4, mode='bilinear', align_corners=True)
f2 = F.interpolate(f2, scale_factor=2, mode='bilinear', align_corners=True)
return f4, f3, f2, f1
def region_representation(self, input):
X = self.X(input)
L = self.L(input)
aux_out = L
batch, n_class, height, width = L.shape
l_flat = L.view(batch, n_class, -1)
# M = B * N * HW
M = torch.softmax(l_flat, -1)
channel = X.shape[1]
# X_flat = B * C * HW
X_flat = X.view(batch, channel, -1)
# f_k = B * C * N
f_k = (M @ X_flat.transpose(1, 2)).transpose(1, 2)
return aux_out, f_k, X_flat, X
def attentive_interaction(self, bank, X_flat, X):
batch, n_class, height, width = X.shape
# query = S * C
query = self.phi(bank).squeeze(dim=2)
# key: = B * C * HW
key = self.psi(X_flat)
# logit = HW * S * B (cross image relation)
logit = torch.matmul(query, key).transpose(0,2)
# attn = HW * S * B
attn = torch.softmax(logit, 2) ##softmax Correct dimension
# delta = S * C
delta = self.delta(bank).squeeze(dim=2)
# attn_sum = B * C * HW
attn_sum = torch.matmul(attn.transpose(1,2), delta).transpose(1,2)
# x_obj = B * C * H * W
X_obj = self.rho(attn_sum).view(batch, -1, height, width)
concat = torch.cat([X, X_obj], 1)
out = self.g(concat)
return out
def forward(self, x, flag='train'):
batch_size = x.shape[0]
#=== Stem ===#
x = self.firstconv(x)
x = self.firstbn(x)
x = self.firstrelu(x)
x_ = self.firstmaxpool(x)
#=== Encoder ===#
e4, e3, e2, e1 = self.down(x_)
#=== Attentive Cross Image Interaction ===#
aux_out, patch, feats_flat, feats = self.region_representation(e4)
if flag == 'train':
self.update_bank(patch)
ptr = int(self.bank_ptr)
if self.bank_full == True:
feature_aug = self.attentive_interaction(self.bank, feats_flat, feats)
else:
feature_aug = self.attentive_interaction(self.bank[0:ptr], feats_flat, feats)
elif flag == 'test':
feature_aug = self.attentive_interaction(patch, feats_flat, feats)
#=== Dual Attention ===#
sa_feat = self.sa_head(e4)
#=== Fusion ===#
feats = sa_feat + feature_aug
#=== Decoder ===#
f4, f3, f2, f1 = self.up(feats, e3, e2, e1, x)
aux_out = F.interpolate(aux_out, scale_factor=32, mode='bilinear', align_corners=True)
return aux_out, f4, f3, f2, f1
experimental analysis
The experimental part mainly includes the following aspects :
| Dataset name | Number of images | train | valid | test |
|---|---|---|---|---|
| EndoScene | 912 | 548 | 182 | 182 |
| Kvasir-SEG | 1000 | 600 | 200 | 200 |
| PICCOLO | 3433 | 2203 | 897 | 333 |
| equipment | Learning rate | epoches | batchsize | memory size |
|---|---|---|---|---|
| NVIDIA RTX 2080Ti | 1e-4 | 150 | 4 | 20(Kvasir) / 40(E & P) |


From the visual and tabular data , We can see the validity of this model !

For these two classical models , Has a good improvement , The design of the model and the rationality of the internal and external context reasoning system are explained .
Discuss
The biggest highlight of this article should be the external memory Set up , For the architecture of the whole model , We should learn this kind of implicit classification thought and idea , So is the mechanism of the so-called external context module !
Cheeky , Want a like collection , Thank you for your support !!!
边栏推荐
- How to resolve kernel errors? Solution to kernel error of win11 system
- 1 goal, 3 fields, 6 factors and 9 links of digital transformation
- [C #] explain the difference between value type and reference type
- 毕业设计-基于Unity的餐厅经营游戏的设计与开发(附源码、开题报告、论文、答辩PPT、演示视频,带数据库)
- I just bought the ADB MySQL service. Every time I do an operation, such as creating a table, this problem will pop up. What is the problem?
- The amazing nanopc-t4 (rk3399) is used as the initial configuration and related applications of the workstation
- 行业分析| 快对讲,楼宇对讲
- leetcode 1689. Partitioning into minimum number of deci binary numbers
- Idea merge other branches into dev branch
- C language file operation
猜你喜欢

SQL Server2019 新建 SQL Server身份验证用户名 并登录

论文3 VScode&texlive&SumatraPDF打造完美书写论文工具

pd.cut 区间参数设定之前后区别

There are thousands of roads. Why did this innovative storage company choose this one?

Leetcode 周赛299

微博评论的高性能高可用计算架构方案

Win11如何给系统盘瘦身?Win11系统盘瘦身方法

C#连接数据库完成增删改查操作
![[unity3d] camera follow](/img/11/6309450f2b3ef33df558104549dc4c.png)
[unity3d] camera follow

《数字经济全景白皮书》消费金融数字化篇 重磅发布
随机推荐
Building tin with point cloud
PCL 计算平面三角形外接圆的圆心和半径
视频压缩处理之ffmpeg用法
Technical methodology of new AI engine under the data infrastructure upgrade window
async-validator.js数据校验器
秋招经验分享 | 银行笔面试该怎么准备
论文3 VScode&texlive&SumatraPDF打造完美书写论文工具
Group programming TIANTI competition exercise - continuously updating
About covariance and correlation
mysql全解 Ubuntu/win10
How to remove dataframe field column names
Anonymous function this pointing and variable promotion
智能计算系统1 环境搭建
《数字经济全景白皮书》消费金融数字化篇 重磅发布
春风动力携手华为打造智慧园区标杆,未来工厂创新迈上新台阶
Try except add auxiliary new column
C#连接数据库完成增删改查操作
华为云OneMeeting告诉你全场景会议这么开!
电脑如何检查驱动程序是否正常
带你手把手实现grafana双轴图