当前位置:网站首页>BiSeNet v2
BiSeNet v2
2022-07-29 08:07:00 【00000cj】
paper:BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
v2 Medium Detail Path and Semantic Path They correspond to each other v1 Medium Spatial Path and Context Path
and v1 comparison , There are mainly the following two improvements
- Removed time-consuming cross layer connections , Simplified model structure .
- Redesigned the overall architecture . Specific include (1) Deepened Detail Path To encode more details (2) about Semantic Path, Based on the depth separable convolution, a lightweight components(3) An effective aggregation layer To strengthen the connection between the two paths
Bilateral Segmentation Network
The overall structure is shown in the figure below
The specific structure of detail branch and semantic branch is shown in the following table
Detail Branch
The detail branch is responsible for extracting spatial detail information , namely low-level Information , Therefore, this branch needs rich channel capacity, that is, a large number of channels, so as to encode rich spatial details . At the same time, because this branch focuses on low-level Information , So it needs to be a stride Small shallow structure . In general, the number of channels and layers required for detailed branches is large . In addition, it is best not to use residual connection, Additional memory access costs reduce speed .
As shown in the table (1) Shown , Detail branch contains 3 individual stage, Every stage contain 2 Convolution layers , After each convolution layer, there is a BN And a ReLU, Every stage The first convolution of stride=2, Therefore, the size of the output characteristic graph of this branch is the input of the model 1/8.
The specific structure of the detail branch is as follows
DetailBranch(
(detail_branch): ModuleList(
(0): Sequential(
(0): ConvModule(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(1): Sequential(
(0): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(2): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(2): Sequential(
(0): ConvModule(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(2): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
)
)Semantic Branch
At the same time, considering the large receptive field and small amount of calculation , The author draws lessons from lightweight networks such as Xception、MobileNet、ShuffleNet The structure of semantic branch is designed , Contrary to the characteristics of shallow layers with large number of channels in detail branches , Semantic branching requires the deep structure of the number of small channels , As follows
Stem Block
Adopted by the author Stem Block As the first semantic Branch stage, Here's the picture (a) Shown , It uses two different downsampling methods to reduce the feature representation , Then the output of the two branches is concatenate, This structure has high computational cost and feature expression ability .
Stem Block The specific structure is as follows
(stage1): StemBlock(
(conv_first): ConvModule(
(conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(convs): Sequential(
(0): ConvModule(
(conv): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(fuse_last): ConvModule(
(conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)Gather-and-Expansion Layer
Except for the first one stem block And the last context embedding block, Each in the middle of the semantic Branch stage It's all by GE layer Composed of , As shown in the figure below
GE Layers include (1) One 3x3 Convolution is used to effectively aggregate feature responses and extend them to high-dimensional space (2) One that extracts features separately on each channel 3x3 Deep convolution (3) One 1x1 Convolution maps the output of depth convolution to a low channel space .
When stride=2 when , In addition, use 2 individual 3x3 Depth convolution further expands the receptive field , And the depth separable convolution is used as shortcut.
Semantic branch of stage3 The structure of is as follows , Specific include 2 individual GE layer, first GE layer stride=2, the second GE layer stride=1
(stage2): Sequential(
(0): GELayer(
(conv1): ConvModule(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(dwconv): Sequential(
(0): ConvModule(
(conv): Conv2d(16, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ConvModule(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
(bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(shortcut): Sequential(
(0): DepthwiseSeparableConvModule(
(depthwise_conv): ConvModule(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(pointwise_conv): ConvModule(
(conv): Conv2d(16, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv2): Sequential(
(0): ConvModule(
(conv): Conv2d(96, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(act): ReLU()
)
(1): GELayer(
(conv1): ConvModule(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(dwconv): Sequential(
(0): ConvModule(
(conv): Conv2d(32, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
(bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(conv2): Sequential(
(0): ConvModule(
(conv): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(act): ReLU()
)
)Context Embedding Block
The author will branch semantics to the last stage The last layer of is made up of GE layer Instead of CE layer, Its structure is shown in the figure (4)(b) Shown , Global average pooling and residual connection are used to efficiently encode global context information .
(stage4_CEBlock): CEBlock(
(gap): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_gap): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(conv_last): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)Bilateral Guided Aggregation
Because the characteristics of detail branch and semantic branch are different , The detail branch extracts low-level Detail features , Semantic branches extract high-level Semantic features , Therefore, we cannot simply pass summation or concatenation The way to fuse the features extracted by the two branches , The author puts forward bilateral guided aggregation layer To fuse complementary information from two branches , Use the context information of semantic branches to guide the feature response of detail branches , Through the guidance of different scales , We can get the feature representation of different scales , Effectively encode multi-scale information . The specific structure is shown in the following figure
BGA Code
class BGALayer(BaseModule):
"""Bilateral Guided Aggregation Layer to fuse the complementary information
from both Detail Branch and Semantic Branch.
Args:
out_channels (int): Number of output channels.
Default: 128.
align_corners (bool): align_corners argument of F.interpolate.
Default: False.
conv_cfg (dict | None): Config of conv layers.
Default: None.
norm_cfg (dict | None): Config of norm layers.
Default: dict(type='BN').
act_cfg (dict): Config of activation layers.
Default: dict(type='ReLU').
init_cfg (dict or list[dict], optional): Initialization config dict.
Default: None.
Returns:
output (torch.Tensor): Output feature map for Segment heads.
"""
def __init__(self,
out_channels=128,
align_corners=False,
conv_cfg=None,
norm_cfg=dict(type='BN'),
act_cfg=dict(type='ReLU'),
init_cfg=None):
super(BGALayer, self).__init__(init_cfg=init_cfg)
self.out_channels = out_channels
self.align_corners = align_corners
self.detail_dwconv = nn.Sequential(
DepthwiseSeparableConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
dw_norm_cfg=norm_cfg,
dw_act_cfg=None,
pw_norm_cfg=None,
pw_act_cfg=None,
))
self.detail_down = nn.Sequential(
ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=2,
padding=1,
bias=False,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=None),
nn.AvgPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=False))
self.semantic_conv = nn.Sequential(
ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
bias=False,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=None))
self.semantic_dwconv = nn.Sequential(
DepthwiseSeparableConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
dw_norm_cfg=norm_cfg,
dw_act_cfg=None,
pw_norm_cfg=None,
pw_act_cfg=None,
))
self.conv = ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
inplace=True,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=act_cfg,
)
def forward(self, x_d, x_s): # (4,128,60,60),(4,128,15,15)
detail_dwconv = self.detail_dwconv(x_d) # (4,128,60,60)
detail_down = self.detail_down(x_d) # (4,128,15,15)
semantic_conv = self.semantic_conv(x_s) # (4,128,15,15)
semantic_dwconv = self.semantic_dwconv(x_s) # (4,128,15,15)
semantic_conv = resize(
input=semantic_conv,
size=detail_dwconv.shape[2:],
mode='bilinear',
align_corners=self.align_corners) # (4,128,60,60)
fuse_1 = detail_dwconv * torch.sigmoid(semantic_conv) # (4,128,60,60)
fuse_2 = detail_down * torch.sigmoid(semantic_dwconv) # (4,128,15,15)
fuse_2 = resize(
input=fuse_2,
size=fuse_1.shape[2:],
mode='bilinear',
align_corners=self.align_corners) # (4,128,60,60)
output = self.conv(fuse_1 + fuse_2) # (4,128,60,60)
return outputBooster Training Strategy
In order to further improve the segmentation accuracy , The author puts forward a strategy of intensive training , It can enhance the feature representation in the training stage , It can be discarded directly in the reasoning stage , Therefore, it will not increase the reasoning speed of the model . Pictured (3) Shown , By dividing the auxiliary head Add to different positions of semantic branches , Additional supervision of the intermediate output of the model , It can improve the accuracy of the model .
Implementation process
Let's say MMSegmentation Medium bisenet v2 Implementation as an example , Review the specific implementation process
hypothesis batch_size=4, Input shape by (4, 3, 480, 480).
- Detail Branch The output of is (4, 128, 60, 60)
- Semantic Branch As shown in the table (1) Shown ,Stem Block The output of is (4, 16, 120, 120),S3 The output of is (4, 32, 60, 60),S4 The output of is (4, 64, 30, 30),S5 The output of includes the second GE Layer output (4, 128, 15, 15) And the last CE Layer output (4, 128, 15, 15). So the output of semantic branch is a list, contain 5 Outputs , Last CE The output of and the output of the detail branch enter into BGA layer , front 4 Outputs during training , As an auxiliary segmentation head The input of .
- Bilateral Guided Aggregation The output of is (4, 128, 60, 60)
Experimental Results
Cityscapes

CamVid
边栏推荐
- Detailed explanation of the find command (the most common operation of operation and maintenance at the end of the article)
- The database uses PSQL and JDBC to connect remotely and disconnect automatically from time to time
- Jianmu continuous integration platform v2.5.2 release
- [dry goods memo] 50 kinds of Matplotlib scientific research paper drawing collection, including code implementation
- Unicode私人使用区域(Private Use Areas)
- Day 014 二维数组练习
- Beautiful girls
- C# 之 volatile关键字解析
- Mqtt server setup and mqtt.fx testing
- 网络安全之安全基线
猜你喜欢
![[beauty of software engineering - column notes]](/img/b9/43db3fdfe1d9f08035668a66da37e2.png)
[beauty of software engineering - column notes] "one question and one answer" issue 3 | 18 common software development problem-solving strategies

简易计算器微信小程序项目源码
![[beauty of software engineering - column notes] 27 | what is the core competitiveness of software engineers? (top)](/img/23/288f6c946a44e36ab58eb0555f3650.png)
[beauty of software engineering - column notes] 27 | what is the core competitiveness of software engineers? (top)

UE4 highlight official reference value

The software package is set to - > Yum source

How to connect VMware virtual machine to external network under physical machine win10 system

What is the use of chat robots? What type? After reading these, you will understand!
![[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching](/img/dc/255bf122d5243f2a08ca0e03b53137.png)
[paper reading | cryoet] gum net: fast and accurate 3D subtomo image alignment and average unsupervised geometric matching
![[cryoelectron microscope | paper reading] interpretation of sub fault average m software: multi particle cryo EM refining with M](/img/5e/69987afcd1e50ba37bc49441dd3a50.png)
[cryoelectron microscope | paper reading] interpretation of sub fault average m software: multi particle cryo EM refining with M

Unicode private use areas
随机推荐
torch.Tensor.to的用法
JVM garbage collection mechanism (GC)
Crawl expression bag
Unity beginner 2 - tile making and world interaction (2D)
Use the cloud code to crack the problem of authentication code encountered during login
Simplefoc parameter adjustment 2- speed and position control
[beauty of software engineering - column notes] 21 | architecture design: can ordinary programmers also implement complex systems?
Beautiful girls
Qt/pyqt window type and window flag
华为无线设备配置利用WDS技术部署WLAN业务
亚马逊测评自养号是什么,卖家应该怎么做?
torch.nn.functional.one_hot()
Do you want to meet all the needs of customers
What is the use of chat robots? What type? After reading these, you will understand!
The new colleague wrote a few pieces of code, broke the system, and was blasted by the boss!
Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding
Official tutorial redshift 01 basic theoretical knowledge and basic characteristics learning
The database uses PSQL and JDBC to connect remotely and disconnect automatically from time to time
关于pip升级损坏导致的问题记录
Dynamic thresholds buffer management in a shared buffer packet switch paper summary