当前位置:网站首页>BiSeNet v2
BiSeNet v2
2022-07-29 08:07:00 【00000cj】
paper:BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation
v2 Medium Detail Path and Semantic Path They correspond to each other v1 Medium Spatial Path and Context Path
and v1 comparison , There are mainly the following two improvements
- Removed time-consuming cross layer connections , Simplified model structure .
- Redesigned the overall architecture . Specific include (1) Deepened Detail Path To encode more details (2) about Semantic Path, Based on the depth separable convolution, a lightweight components(3) An effective aggregation layer To strengthen the connection between the two paths
Bilateral Segmentation Network
The overall structure is shown in the figure below
The specific structure of detail branch and semantic branch is shown in the following table
Detail Branch
The detail branch is responsible for extracting spatial detail information , namely low-level Information , Therefore, this branch needs rich channel capacity, that is, a large number of channels, so as to encode rich spatial details . At the same time, because this branch focuses on low-level Information , So it needs to be a stride Small shallow structure . In general, the number of channels and layers required for detailed branches is large . In addition, it is best not to use residual connection, Additional memory access costs reduce speed .
As shown in the table (1) Shown , Detail branch contains 3 individual stage, Every stage contain 2 Convolution layers , After each convolution layer, there is a BN And a ReLU, Every stage The first convolution of stride=2, Therefore, the size of the output characteristic graph of this branch is the input of the model 1/8.
The specific structure of the detail branch is as follows
DetailBranch(
(detail_branch): ModuleList(
(0): Sequential(
(0): ConvModule(
(conv): Conv2d(3, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(1): Sequential(
(0): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(2): ConvModule(
(conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(2): Sequential(
(0): ConvModule(
(conv): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(2): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
)
)
Semantic Branch
At the same time, considering the large receptive field and small amount of calculation , The author draws lessons from lightweight networks such as Xception、MobileNet、ShuffleNet The structure of semantic branch is designed , Contrary to the characteristics of shallow layers with large number of channels in detail branches , Semantic branching requires the deep structure of the number of small channels , As follows
Stem Block
Adopted by the author Stem Block As the first semantic Branch stage, Here's the picture (a) Shown , It uses two different downsampling methods to reduce the feature representation , Then the output of the two branches is concatenate, This structure has high computational cost and feature expression ability .
Stem Block The specific structure is as follows
(stage1): StemBlock(
(conv_first): ConvModule(
(conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(convs): Sequential(
(0): ConvModule(
(conv): Conv2d(16, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(8, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(1): ConvModule(
(conv): Conv2d(8, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(pool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
(fuse_last): ConvModule(
(conv): Conv2d(32, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
Gather-and-Expansion Layer
Except for the first one stem block And the last context embedding block, Each in the middle of the semantic Branch stage It's all by GE layer Composed of , As shown in the figure below
GE Layers include (1) One 3x3 Convolution is used to effectively aggregate feature responses and extend them to high-dimensional space (2) One that extracts features separately on each channel 3x3 Deep convolution (3) One 1x1 Convolution maps the output of depth convolution to a low channel space .
When stride=2 when , In addition, use 2 individual 3x3 Depth convolution further expands the receptive field , And the depth separable convolution is used as shortcut.
Semantic branch of stage3 The structure of is as follows , Specific include 2 individual GE layer, first GE layer stride=2, the second GE layer stride=1
(stage2): Sequential(
(0): GELayer(
(conv1): ConvModule(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(dwconv): Sequential(
(0): ConvModule(
(conv): Conv2d(16, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(1): ConvModule(
(conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=96, bias=False)
(bn): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(shortcut): Sequential(
(0): DepthwiseSeparableConvModule(
(depthwise_conv): ConvModule(
(conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), groups=16, bias=False)
(bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(pointwise_conv): ConvModule(
(conv): Conv2d(16, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv2): Sequential(
(0): ConvModule(
(conv): Conv2d(96, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(act): ReLU()
)
(1): GELayer(
(conv1): ConvModule(
(conv): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(dwconv): Sequential(
(0): ConvModule(
(conv): Conv2d(32, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
(bn): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
(conv2): Sequential(
(0): ConvModule(
(conv): Conv2d(192, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(act): ReLU()
)
)
Context Embedding Block
The author will branch semantics to the last stage The last layer of is made up of GE layer Instead of CE layer, Its structure is shown in the figure (4)(b) Shown , Global average pooling and residual connection are used to efficiently encode global context information .
(stage4_CEBlock): CEBlock(
(gap): Sequential(
(0): AdaptiveAvgPool2d(output_size=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(conv_gap): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
(conv_last): ConvModule(
(conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(activate): ReLU(inplace=True)
)
)
Bilateral Guided Aggregation
Because the characteristics of detail branch and semantic branch are different , The detail branch extracts low-level Detail features , Semantic branches extract high-level Semantic features , Therefore, we cannot simply pass summation or concatenation The way to fuse the features extracted by the two branches , The author puts forward bilateral guided aggregation layer To fuse complementary information from two branches , Use the context information of semantic branches to guide the feature response of detail branches , Through the guidance of different scales , We can get the feature representation of different scales , Effectively encode multi-scale information . The specific structure is shown in the following figure
BGA Code
class BGALayer(BaseModule):
"""Bilateral Guided Aggregation Layer to fuse the complementary information
from both Detail Branch and Semantic Branch.
Args:
out_channels (int): Number of output channels.
Default: 128.
align_corners (bool): align_corners argument of F.interpolate.
Default: False.
conv_cfg (dict | None): Config of conv layers.
Default: None.
norm_cfg (dict | None): Config of norm layers.
Default: dict(type='BN').
act_cfg (dict): Config of activation layers.
Default: dict(type='ReLU').
init_cfg (dict or list[dict], optional): Initialization config dict.
Default: None.
Returns:
output (torch.Tensor): Output feature map for Segment heads.
"""
def __init__(self,
out_channels=128,
align_corners=False,
conv_cfg=None,
norm_cfg=dict(type='BN'),
act_cfg=dict(type='ReLU'),
init_cfg=None):
super(BGALayer, self).__init__(init_cfg=init_cfg)
self.out_channels = out_channels
self.align_corners = align_corners
self.detail_dwconv = nn.Sequential(
DepthwiseSeparableConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
dw_norm_cfg=norm_cfg,
dw_act_cfg=None,
pw_norm_cfg=None,
pw_act_cfg=None,
))
self.detail_down = nn.Sequential(
ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=2,
padding=1,
bias=False,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=None),
nn.AvgPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=False))
self.semantic_conv = nn.Sequential(
ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
bias=False,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=None))
self.semantic_dwconv = nn.Sequential(
DepthwiseSeparableConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
dw_norm_cfg=norm_cfg,
dw_act_cfg=None,
pw_norm_cfg=None,
pw_act_cfg=None,
))
self.conv = ConvModule(
in_channels=self.out_channels,
out_channels=self.out_channels,
kernel_size=3,
stride=1,
padding=1,
inplace=True,
conv_cfg=conv_cfg,
norm_cfg=norm_cfg,
act_cfg=act_cfg,
)
def forward(self, x_d, x_s): # (4,128,60,60),(4,128,15,15)
detail_dwconv = self.detail_dwconv(x_d) # (4,128,60,60)
detail_down = self.detail_down(x_d) # (4,128,15,15)
semantic_conv = self.semantic_conv(x_s) # (4,128,15,15)
semantic_dwconv = self.semantic_dwconv(x_s) # (4,128,15,15)
semantic_conv = resize(
input=semantic_conv,
size=detail_dwconv.shape[2:],
mode='bilinear',
align_corners=self.align_corners) # (4,128,60,60)
fuse_1 = detail_dwconv * torch.sigmoid(semantic_conv) # (4,128,60,60)
fuse_2 = detail_down * torch.sigmoid(semantic_dwconv) # (4,128,15,15)
fuse_2 = resize(
input=fuse_2,
size=fuse_1.shape[2:],
mode='bilinear',
align_corners=self.align_corners) # (4,128,60,60)
output = self.conv(fuse_1 + fuse_2) # (4,128,60,60)
return output
Booster Training Strategy
In order to further improve the segmentation accuracy , The author puts forward a strategy of intensive training , It can enhance the feature representation in the training stage , It can be discarded directly in the reasoning stage , Therefore, it will not increase the reasoning speed of the model . Pictured (3) Shown , By dividing the auxiliary head Add to different positions of semantic branches , Additional supervision of the intermediate output of the model , It can improve the accuracy of the model .
Implementation process
Let's say MMSegmentation Medium bisenet v2 Implementation as an example , Review the specific implementation process
hypothesis batch_size=4, Input shape by (4, 3, 480, 480).
- Detail Branch The output of is (4, 128, 60, 60)
- Semantic Branch As shown in the table (1) Shown ,Stem Block The output of is (4, 16, 120, 120),S3 The output of is (4, 32, 60, 60),S4 The output of is (4, 64, 30, 30),S5 The output of includes the second GE Layer output (4, 128, 15, 15) And the last CE Layer output (4, 128, 15, 15). So the output of semantic branch is a list, contain 5 Outputs , Last CE The output of and the output of the detail branch enter into BGA layer , front 4 Outputs during training , As an auxiliary segmentation head The input of .
- Bilateral Guided Aggregation The output of is (4, 128, 60, 60)
Experimental Results
Cityscapes
CamVid
边栏推荐
- Si12t and si14t low power capacitive touch chips
- Detailed explanation of two modes of FTP
- Simplefoc parameter adjustment 1-torque control
- Official tutorial redshift 01 basic theoretical knowledge and basic characteristics learning
- Shell script - global variables, local variables, environment variables
- Unity beginner 3 - enemy movement control and setting of blood loss area (2D)
- [cryoelectron microscope | paper reading] emclarity: software for high-resolution cryoelectron tomography and sub fault averaging
- [beauty of software engineering - column notes] 30 | make good use of source code management tools to make your collaboration more efficient
- Phased learning about the entry-level application of SQL Server statements - necessary for job hunting (I)
- "Swiss Army Knife" -nc in network tools
猜你喜欢
Compare three clock circuit schemes of single chip microcomputer
STM32 serial port garbled
Alibaba political commissar system - Chapter 4: political commissars are built on companies
[beauty of software engineering - column notes] 22 | how to do a good job in technology selection for the project?
Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding
[lecture notes] how to do in-depth learning in poor data?
Some thoughts on growing into an architect
V-Ray 5 acescg workflow settings
[experience] relevant configuration of remote connection to intranet server through springboard machine
JVM garbage collection mechanism (GC)
随机推荐
Arduino uno error analysis avrdude: stk500_ recv(): programmer is not responding
[beauty of software engineering - column notes] 22 | how to do a good job in technology selection for the project?
[beauty of software engineering - column notes] "one question and one answer" issue 2 | 30 common software development problem-solving strategies
Pytest set (7) - parameterization
Unity beginner 4 - frame animation and protagonist attack (2D)
Jump from mapper interface to mapping file XML in idea
Simplefoc parameter adjustment 1-torque control
UE4 principle and difference between skylight and reflecting sphere
Volatile keyword parsing of C #
V-Ray 5 acescg workflow settings
[beauty of software engineering - column notes] 27 | what is the core competitiveness of software engineers? (top)
Ionicons icon Encyclopedia
[beauty of software engineering - column notes] 28 | what is the core competitiveness of software engineers? (next)
Greenplus enterprise deployment
An optimal buffer management scheme with dynamic thresholds paper summary
Unicode私人使用区域(Private Use Areas)
Amaze UI icon query
Data unit: bit, byte, word, word length
Implementation of simple cubecap+fresnel shader in unity
关于pip升级损坏导致的问题记录