当前位置:网站首页>Double contextual relationship network for polyp segmentation
Double contextual relationship network for polyp segmentation
2022-06-28 19:20:00 【sigmoidAndRELU】
Colonoscopy image segmentation paper reading
- The overall structure of the thesis
The overall structure of the thesis
Title of thesis : A dual context network for polyp segmentation (ISBI2022)
Author's unit : Beijing University of Posts and telecommunications
Author's name : Yinzijin et al
Code address : https://github.com/PRIS-CV/DCRNet/blob/master/lib/DCRNet.py
Abstract
Automatic segmentation of polyps in colonoscopy in colorectal cancer (CRC) It plays a key role in the early diagnosis of . However , The diversity of polyp images greatly increases the difficulty of accurate segmentation . The existing research mainly focuses on learning Context information in a single image , But failed to take advantage of Synchronous visual pattern of polyps across images . This article explores context dependency from the overall perspective of the entire data set , A duplex context network is proposed (DCRNet) To capture Within and between images Of Context . Based on the above two similarities , The features of each input region can be enhanced by embedding the context region . In order to store the feature area embedded in the previous image in the training process , Episodic memory is designed and operated as a queue . We are EndoScene、Kvasir-SEG And recently released large-scale PICCOLO The proposed method is evaluated on the dataset . Experimental results show that , What we proposed DCRNet It is superior to the most advanced methods in terms of widely used evaluation indicators .
contribution :
1、 Propose to embed the context area ;
2、 Episodic memory is designed and operated as a queue ;
3、 Put forward DCRNet;
4、 The model performs well on multiple colon cancer datasets .
introduction
Diagnosis and treatment of colon cancer , Regional analysis of polyps is a key step , Polypectomy is a direct method to prevent and treat early colorectal cancer . The colonoscopy image can clearly show the information of the whole patient's colon , However, there are still some difficulties in the localization and segmentation of polyps :1、 Polyps are various ;2、 The boundary between polyp and colonic mucosa is too vague . As shown in the figure :
From the image we can observe , Some are obvious , image a b, The swollen part is , and d It's very exaggerated ,c It's not obvious , You can't see it without looking carefully .
Related work
In the existing work , Here is a brief introduction :
1、 Multi-scale feature extraction network :ACSNet(MICCAI 2020), Combining context information and local details to deal with the problem of polyp feature diversity .
PraNet Using multi-scale feature aggregation , The contour map is extracted according to local features and the segmentation map is refined by up sampling .
2、 Use auxiliary information to constrain the segmentation results :SFANet(MICCAI 2019), Using region boundary constraints , To select feature aggregation , Improve segmentation accuracy .
a key : These jobs , forehead , It seems that we are all looking for feature segmentation on a single image , In this case, it is not related to a recessive lesion similarity , Then select the corresponding segmentation parameters ?? If so , What a model can do is to segment the obvious lesions , For different types of polyp images, the corresponding invisible classification , A simple image is simply divided , Complex images and inconspicuous images are special methods , A lot of sense !
So this article will mention a mechanism , It's called episodic memory !
Theoretical proof :(Content-based medical image retrieval of ct images of
liver lesions using manifold learning) The significance of retrieving from other images in the treatment of radiological lesions has been demonstrated .
Related achievements : It has been used in measurement learning .
therefore , This paper adopts this idea , From the whole point of view of the whole data set, this paper discusses the cross image and the feature association in the image .
Job summary :
1、 Intra image context module
2、 Context relation module outside the image
These two modules are also plug and play .
Model structure
First picture

First, see the network framework diagram , It consists of three parts , Encoder 、 decoder 、 Bottom information processing module .
Codec used in this paper is based on ResNet34 Of UNet, No more details here . Watch the main play directly !
Internal context
class PAM_Module(Module):
""" Position attention module"""
#Ref from SAGAN
def __init__(self, in_dim):
super(PAM_Module, self).__init__()
self.chanel_in = in_dim
self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1)
self.value_conv = Conv2d(in_channels=in_dim, out_channels=in_dim, kernel_size=1)
self.gamma = Parameter(torch.zeros(1))
self.softmax = Softmax(dim=-1)
def forward(self, x):
""" inputs : x : input feature maps( B X C X H X W) returns : out : attention value + input feature attention: B X (HxW) X (HxW) """
m_batchsize, C, height, width = x.size()
proj_query = self.query_conv(x).view(m_batchsize, -1, width*height).permute(0, 2, 1)
proj_key = self.key_conv(x).view(m_batchsize, -1, width*height)
energy = torch.bmm(proj_query, proj_key)
attention = self.softmax(energy)
proj_value = self.value_conv(x).view(m_batchsize, -1, width*height)
out = torch.bmm(proj_value, attention.permute(0, 2, 1))
out = out.view(m_batchsize, C, height, width)
out = self.gamma*out + x
return out
This code , The notes written by the author are very detailed , This function is to establish the relationship between all pixels in the current image , Then multiply this relationship by the input , So as to obtain the weighted effect ! Of course , The residual structure is always a reserved item , Um. , That's it .
External context ( This is the first time in my life , It is worth observing )
class DCRNet(ResNet34Unet):
def __init__(self,
bank_size=20,
num_classes=1,
num_channels=3,
is_deconv=False,
decoder_kernel_size=3,
pretrained=True,
feat_channels=512
):
super().__init__(num_classes=1,
num_channels=3,
is_deconv=False,
decoder_kernel_size=3,
pretrained=True)
self.bank_size = bank_size
self.register_buffer("bank_ptr", torch.zeros(1, dtype=torch.long)) # memory bank pointer
self.register_buffer("bank", torch.zeros(self.bank_size, feat_channels, num_classes)) # memory bank
self.bank_full = False
# =====Attentive Cross Image Interaction==== #
self.feat_channels = feat_channels
self.L = nn.Conv2d(feat_channels, num_classes, 1)
self.X = conv2d(feat_channels, 512, 3)
self.phi = conv1d(512, 256)
self.psi = conv1d(512, 256)
self.delta = conv1d(512, 256)
self.rho = conv1d(256, 512)
self.g = conv2d(512 + 512, 512, 1)
# =========Dual Attention========== #
self.sa_head = PAM_Module(feat_channels)
#=========Attention Fusion=========#
self.fusion = nn.Conv2d(feat_channels, feat_channels, 1)
#==Initiate the pointer of bank buffer==#
def init(self):
self.bank_ptr[0] = 0
self.bank_full = False
@torch.no_grad() # This is very important !!!!
def update_bank(self, x):
ptr = int(self.bank_ptr)
batch_size = x.shape[0]
vacancy = self.bank_size - ptr
if batch_size >= vacancy:
self.bank_full = True
pos = min(batch_size, vacancy)
self.bank[ptr:ptr+pos] = x[0:pos].clone()
# update pointer
ptr = (ptr + pos) % self.bank_size
self.bank_ptr[0] = ptr
def down(self, x):
e1 = self.encoder1(x)
e2 = self.encoder2(e1)
e3 = self.encoder3(e2)
e4 = self.encoder4(e3)
return e4, e3, e2, e1
def up(self, feat, e3, e2, e1, x):
center = self.center(feat)
d4 = self.decoder4(torch.cat([center, e3], 1))
d3 = self.decoder3(torch.cat([d4, e2], 1))
d2 = self.decoder2(torch.cat([d3, e1], 1))
d1 = self.decoder1(torch.cat([d2, x], 1))
f1 = self.finalconv1(d1)
f2 = self.finalconv2(d2)
f3 = self.finalconv3(d3)
f4 = self.finalconv4(d4)
f4 = F.interpolate(f4, scale_factor=8, mode='bilinear', align_corners=True)
f3 = F.interpolate(f3, scale_factor=4, mode='bilinear', align_corners=True)
f2 = F.interpolate(f2, scale_factor=2, mode='bilinear', align_corners=True)
return f4, f3, f2, f1
def region_representation(self, input):
X = self.X(input)
L = self.L(input)
aux_out = L
batch, n_class, height, width = L.shape
l_flat = L.view(batch, n_class, -1)
# M = B * N * HW
M = torch.softmax(l_flat, -1)
channel = X.shape[1]
# X_flat = B * C * HW
X_flat = X.view(batch, channel, -1)
# f_k = B * C * N
f_k = (M @ X_flat.transpose(1, 2)).transpose(1, 2)
return aux_out, f_k, X_flat, X
def attentive_interaction(self, bank, X_flat, X):
batch, n_class, height, width = X.shape
# query = S * C
query = self.phi(bank).squeeze(dim=2)
# key: = B * C * HW
key = self.psi(X_flat)
# logit = HW * S * B (cross image relation)
logit = torch.matmul(query, key).transpose(0,2)
# attn = HW * S * B
attn = torch.softmax(logit, 2) ##softmax Correct dimension
# delta = S * C
delta = self.delta(bank).squeeze(dim=2)
# attn_sum = B * C * HW
attn_sum = torch.matmul(attn.transpose(1,2), delta).transpose(1,2)
# x_obj = B * C * H * W
X_obj = self.rho(attn_sum).view(batch, -1, height, width)
concat = torch.cat([X, X_obj], 1)
out = self.g(concat)
return out
def forward(self, x, flag='train'):
batch_size = x.shape[0]
#=== Stem ===#
x = self.firstconv(x)
x = self.firstbn(x)
x = self.firstrelu(x)
x_ = self.firstmaxpool(x)
#=== Encoder ===#
e4, e3, e2, e1 = self.down(x_)
#=== Attentive Cross Image Interaction ===#
aux_out, patch, feats_flat, feats = self.region_representation(e4)
if flag == 'train':
self.update_bank(patch)
ptr = int(self.bank_ptr)
if self.bank_full == True:
feature_aug = self.attentive_interaction(self.bank, feats_flat, feats)
else:
feature_aug = self.attentive_interaction(self.bank[0:ptr], feats_flat, feats)
elif flag == 'test':
feature_aug = self.attentive_interaction(patch, feats_flat, feats)
#=== Dual Attention ===#
sa_feat = self.sa_head(e4)
#=== Fusion ===#
feats = sa_feat + feature_aug
#=== Decoder ===#
f4, f3, f2, f1 = self.up(feats, e3, e2, e1, x)
aux_out = F.interpolate(aux_out, scale_factor=32, mode='bilinear', align_corners=True)
return aux_out, f4, f3, f2, f1
experimental analysis
The experimental part mainly includes the following aspects :
| Dataset name | Number of images | train | valid | test |
|---|---|---|---|---|
| EndoScene | 912 | 548 | 182 | 182 |
| Kvasir-SEG | 1000 | 600 | 200 | 200 |
| PICCOLO | 3433 | 2203 | 897 | 333 |
| equipment | Learning rate | epoches | batchsize | memory size |
|---|---|---|---|---|
| NVIDIA RTX 2080Ti | 1e-4 | 150 | 4 | 20(Kvasir) / 40(E & P) |


From the visual and tabular data , We can see the validity of this model !

For these two classical models , Has a good improvement , The design of the model and the rationality of the internal and external context reasoning system are explained .
Discuss
The biggest highlight of this article should be the external memory Set up , For the architecture of the whole model , We should learn this kind of implicit classification thought and idea , So is the mechanism of the so-called external context module !
Cheeky , Want a like collection , Thank you for your support !!!
边栏推荐
- Rigid error: could not extract PIDs from PS output PIDS: [], Procs: [“bad pid
- I just bought the ADB MySQL service. Every time I do an operation, such as creating a table, this problem will pop up. What is the problem?
- sql计算每日新增用户、及留存率指标
- Shell脚本批量修改文件目录权限
- 论文笔记:Universal Value Function Approximators
- 1 goal, 3 fields, 6 factors and 9 links of digital transformation
- Ffmpeg usage in video compression processing
- [unity3d] camera follow
- 智能计算系统1 环境搭建
- Installing the nodejs environment
猜你喜欢

new String(“hello“)之后,到底创建了几个对象?

Mindspire series one loading image classification data set

释放互联网价值的 Web3

leetcode 1423. Maximum points you can obtain from cards

Servlet的使用手把手教学(一)

Analysis of the core components of mybayis

How to resolve kernel errors? Solution to kernel error of win11 system

sql面试题:求连续最大登录天数

How many objects are created after new string ("hello")?

sql计算每日新增用户、及留存率指标
随机推荐
C language file operation
电脑如何检查驱动程序是否正常
Rxjs map, mergeMap 和 switchMap 的区别和联系
Cannot read property 'MTJ' of undefined
PCL 环境下安装配置CGAL 5.4.1
grafana绘制走势图
Upward and downward transformation
AOSP Tsinghua image download error resolution
Differences and relations among rxjs map, mergemap and switchmap
About Significance Tests
直播app系统源码,动态遇到视频时开始自动播放
Web3 that unleashes the value of the Internet
I just bought the ADB MySQL service. Every time I do an operation, such as creating a table, this problem will pop up. What is the problem?
How to remove dataframe field column names
Advanced - Introduction to business transaction design and development
matlab 受约束的 Delaunay 三角剖分
Win11如何给系统盘瘦身?Win11系统盘瘦身方法
C语言-函数知识点
团体程序设计天梯赛练习题-持续更新中
Find out the users who log in for 7 consecutive days and 30 consecutive days