当前位置：网站首页>Cvpr2022 | panopticdepth: a unified framework for depth aware panoramic segmentation

Cvpr2022 | panopticdepth: a unified framework for depth aware panoramic segmentation

2022-06-29 13:09:00 【CV technical guide (official account)】

Preface In this paper, we propose a panoramic segmentation method based on depth perception （DPS） A unified framework for , It aims to reconstruct a 3D scene with instance level semantics from an image . This framework applies dynamic convolution technique to panoramic segmentation （PS） And depth prediction tasks , To generate an instance specific kernel to predict the depth and segmentation mask of each instance . Besides , Using the case level depth estimation scheme , Added additional instance level depth cues , To help monitor deep learning through new depth loss .

Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .

The paper ：PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation

The paper ：http://arxiv.org/pdf/2206.00468

Code ：https://github.com/NaiyuGao/PanopticDepth.

background

Depth aware panoramic segmentation （DPS） It is a new challenging task in scene understanding , It attempts to construct a three-dimensional scene with instance level semantic understanding from a single image .

DPS A simple solution is in panoramic segmentation （PS） Add a dense depth regression header to the network , Generate a depth value for each marked pixel , This method is intuitive but suboptimal .

Because it uses two separate branches to handle these two tasks , So it did not explore the mutually beneficial relationship between them , In particular, there is no convenient instance level semantic clues to improve the depth accuracy .

in addition , The author observed , Pixels of adjacent instances usually have discontinuous depths . for example , Two cars in a line may have different depths . therefore , It is difficult to predict the exact depth of two vehicles using the same pixel depth regression .

On the other hand , The authors consider that these pixels come from different vehicles , If separate regressors are used , Is conducive to depth estimation .

According to the above ideas , In this paper, the author proposes a unified method that can predict mask and depth values in the same instance PanopticDepth Model framework （ Pictured 1）.

chart 1 An example of a unified solution for depth aware panoramic segmentation

contribution

1. An example specific dynamic convolution kernel technique is proposed to unify the depth estimation and panoramic segmentation methods , This improves the performance of these two tasks .

2. To simplify depth estimation , Inspired by batch normalization , It is proposed to represent each instance depth graph as a triple , Normalized depth map 、 Depth range and depth offset , Normalize the value of the original instance depth map to [0,1], To improve learning efficiency .

3. Based on the new depth map representation （ Such as depth offset ） Added instance level depth Statistics , To enhance in-depth monitoring . In order to adapt to this new supervision , The corresponding depth loss , To improve depth prediction .

Method

A unified depth aware panoramic segmentation model is proposed PanopticDepth, It predicts mask and depth values in the same way as the example . In addition to backbone and feature pyramid networks , It also includes three sub networks , Including a kernel generator for generating instance classes 、 Instance specific mask and depth convolution kernel 、 Panoramic segmentation model for generating instance mask and instance depth map generator for estimating instance depth . The network architecture is shown in the figure 2 Shown .

chart 2 PanopticDepth frame

1. Kernel generator

Generate instance classification through kernel generator sub network 、 Mask convolution kernel and depth estimation kernel （ chart 2 The top half of ）. The kernel generator is based on the most advanced panoramic segmentation model PanopticFCN, The model adopts PS Dynamic convolution technique , Compared with other latest methods , Training time required and GPU Less memory .

The kernel generator adopted by the author is divided into two stages: kernel generator and kernel fusion . In the kernel generator phase , take FPN pass the civil examinations i A single phase feature of the phase is used as input , The generator generates a kernel weight map , And two position mappings generated for the object and the object respectively , Given each FPN Position diagram and kernel weight diagram of the stage , In the nuclear fusion phase , Merge multiple FPN The repeated kernel weight of the stage , Through the proposed adaptive kernel fusion （AKF） The operation realizes .

2. Panoramic segmentation

An instance specific kernel method is used to perform panoramic segmentation , Pictured 2 Shown at the bottom .thing and stuff The mask of the instance M It is obtained by convolution shared high-resolution mask embedding mapping ∈ , The mask core is , Then proceed Sigmoid Activate ：

First discard the redundant instance mask . then , Match all remaining instance masks with argmax Merge , To generate non overlapping panoramic segmentation results , So each pixel is assigned to a thing or fill segment , None of the pixels are marked as “VOID”.

Besides , The author also proposes an additional training process , That is, fine tune the learning model on the full image scale , But the batch size is small . To bridge the performance gap between training and testing .

3. Case based depth estimation

The depth of each instance is predicted by the same instance specific kernel technique used in panoramic segmentation , This technique unifies depth estimation and panoramic segmentation . Pictured 2 As shown in the middle of , First, run the depth kernel on the depth embedding to generate the instance depth map , Then these individual images are combined according to the panoramic segmentation results to generate the final overall depth map .

3.1 Depth generator

Given the instance specific depth kernel Kd And shared deep embedding Ed, Similar to the instance mask generation process , By convolution and Sigmoid Activate to generate normalized instance depth map D', And then by the equation 4 Or equation 5 Denormalize it as a depth map D：

The depth map D′ Only the relative depth values in each instance are encoded , So it's easier to learn . Besides , Two normalization schemes have been developed , That's the formula 4 And the formula 5, And found that the latter is better .

After obtaining all instance depth maps , According to the non overlapping panoramic segmentation mask M Aggregate them into a complete image depth map . This produces an exact depth value at the instance boundary .

3.2 Depth loss

The depth loss function is developed based on the combination of proportional invariant logarithmic error and relative square error , as follows ：

Due to the case-based depth estimation method , The author learns depth prediction under traditional pixel level monitoring and additional instance level monitoring , This improves the depth accuracy empirically . In order to achieve double Supervision , Final depth loss Ldep Including two loss items . One is pixel level depth loss , The other is instance level depth loss ：

experiment

surface 1： Verification of urban landscape and panoramic segmentation results of test set .”AKF：“ Adaptive kernel fusion ”FSF： Overall fine tuning

surface 2： Urban landscape DPS Depth aware panoramic segmentation results on

surface 3： Urban landscape DPS The study of ablation .”IDE“： Instance depth estimation ”IDN“： Instance depth normalization

surface 4： The monocular depth estimation method of urban landscape uses panoramic segmentation annotation

chart 3： Pixel level depth estimation outputs a smooth value at the boundary of two instances , Instance level depth estimation can generate more reasonable discontinuous depth values

chart 4：PanopticDepth Prediction examples of the model

Conclusion

This paper proposes a unified depth aware panoramic segmentation framework , Generate an instance specific kernel to predict the depth and segmentation mask for each instance .

Dynamic kernel technology is used to introduce high-level target information into depth estimation , The depth map of each instance is normalized using depth offset and depth range , To simplify sharing deep embedded learning .

Besides , This paper also proposes a new depth loss method to supervise the deep learning of instance level depth cues . In the urban landscape DPS and SemKITTI DPS Experiments on benchmark show the effectiveness of this method .

Looking for a friend who is very familiar with object detection , A summary of target detection from traditional methods to deep learning , It mainly includes traditional method detection 、RCNN series 、YOLO series 、anchor-free series 、 Summary of small target detection methods 、 Summary of small sample target detection methods 、 Summary of object detection methods in video 、 Summary of loss function used in target detection . Support writing while learning . There are certain royalties and benefits , Please contact me for details （ Scan the QR code in the link ）. Similarly, it also includes image segmentation 、Transformer Wait for the direction .

CV The technical guide creates a computer vision technology exchange group and a free version of the knowledge planet , At present, the number of people on the planet has 600+, The number of topics reached 200+.

The knowledge planet will release some homework every day , It is used to guide people to learn something , You can continue to punch in and learn according to your homework .

Every day in the technology group, the top conference papers published in recent days will be sent , You can choose the papers you are interested in to read , continued follow Latest technology , If you write an interpretation after reading it and submit it to us , You can also receive royalties .

in addition , The technical group and my circle of friends will also publish various periodicals 、 Notice of solicitation of contributions for the meeting , If you need it, please scan your friends , And pay attention to .

Add groups and planets ： Official account CV Technical guide , Get and edit wechat , Invite to join .

Welcome to the official account CV Technical guide , Focus on computer vision technology summary 、 The latest technology tracking 、 Interpretation of classic papers 、CV Recruitment information .