当前位置：网站首页>CVPR2022——A VERSATILE MULTI-VIEW FRAMEWORK

CVPR2022——A VERSATILE MULTI-VIEW FRAMEWORK

2022-08-11 06:15:00 【zhSunw】

A VERSATILE MULTI-VIEW FRAMEWORK FOR LIDAR-BASED 3D OBJECT DETECTION WITH GUIDANCE FROM PANOPTIC SEGMENTATION

Contribution:
Keyknowledge:
- Cascade RV Feature Fusion Module: Fusion Multi-Level Range View Features
- Attention-based RV-BEV Feature Weighting Module: Weights the two RV-BEV feature maps to highlight important feature values.
Class-wise Foreground Attention Module: Embedding Foreground Semantic Information into Features
Center Density Heatmap Module: Calculate Center Density
Experiments

Grey represents the CPSeg module, blue represents the CenterPoint module

Contribution:

Proposes a multi-task framework (panoramic segmentation + object detection), and panoramic segmentation improves object detection.
The framework can be used for any BEV-based object detection method
Extensive experiments and ablation experiments

Keyknowledge:

Multi-View Backbone Augmentation
RV: Feature representation is dense, easy to detect small objects.But there are size changes (near big and far small) and occlusion.
BEV: There is no problem of near, far, small and occlusion, and it is easy to detect dense targets and determine boundaries.But sparsity is not good for detecting small objects.

Cascade RV Feature Fusion Module: Fusion Multi-Level Range View Features

insert image description here
Three different resolution features provided by the panorama segmentationThe graph starts from r1 (minimum resolution), first through the Convolutional Block Attention Module (CBAM) module to weight the feature map layer by layer, and then through deconvolution upsampling to connect to a higher resolution feature map until the highest resolutionThe r3 feature map of .It is then projected to the BEV plane, and the resolution is adapted to the 2D BackBone detection head through Space2Depth + Conv 1x1 downsampling block compression.

Attention-based RV-BEV Feature Weighting Module: Weights the two RV-BEV feature maps to highlight important feature values.

insert image description here
The modified CBAM module is used for both RV and BEVThe channel attention feature map and the spatial attention feature map are calculated separately for each feature map, and the two feature maps are broadcasted and added and then activated to generate the attention map.The input RV-BEV features are weighted with an attention map to highlight feature values that are helpful for the detection task.Finally, it is added to the input features to obtain an attention-weighted multi-view feature map.
Attention-Based Feature Weighting Map Visualization: The red and blue regions represent where the BEV and RV features are considered to have higher weights, respectively.
insert image description here
Attention-based RV-BEV weighting module as representativeRV features for nearby and smaller objects are assigned higher weights, while occluded and distant objects tend to favor BEV features.

Class-wise Foreground Attention Module: Embedding Foreground Semantic Information into Features

Because the paper lacks a module diagram and has not been open sourced insert image description here
According to the textThe description draws the module diagram according to his own understanding: Insert picture description here
Provide the panorama segmentedThe per-foreground class probabilities of each point (disregarding background points) are maxpooling down-sampled to the same resolution as the RV-BEV weighted feature map.Then weight by class: each class weights the feature map by element multiplication by probability, and then 1×1 convolution compresses the channel dimension to obtain the Class-wise Foreground Attention Map, and finally connects the weighted maps of each class.Changing the shape of the feature map through 1×1 convolution is the same as the input RV-BEV feature map, and then adding the two to obtain the final weighted feature map.

Center Density Heatmap Module: Calculate Center Density

Draw the Center Density Heatmap through the 3D BBox center offset of each point estimated by CPSeg:
First use the foreground mask to pass the background points, and the remaining points are offset according to the offset prediction and projected on the BEV plane,Then create a Heatmap according to the Heatmap function:
Insert image description here
C(x,y) represents the number of projections to this point.
The more times a point is predicted to be the center point, the larger the value, the darker (green) the color:

Experiments

Sota:
insert image description here
Ablation experiment:

Effectiveness of multitasking framework:

Effectiveness of multi-task (single-task pre-trained CPSeg):