当前位置：网站首页>[paper notes] street view change detection with deconvolutional networks

[paper notes] street view change detection with deconvolutional networks

2022-06-25 15:14:00 【m0_ sixty-one million eight hundred and ninety-nine thousand on】

The paper

Thesis title ：Street-View Change Detection with Deconvolutional Networks

Address of thesis ：https://www.researchgate.net/publication/304533064_Street-View_Change_Detection_with_Deconvolutional_Networks

primary coverage

Propose a system , It is used to detect the structural change of street view video taken by vehicle mounted monocular camera . Multi sensor fusion SLAM And fast and dense 3D Rebuild the pipe connections , The roughly registered image pair is provided to the depth deconvolution network , For pixel level change detection .

Put forward CDNet, An efficient method based on stack shrink and expansion block CNN Architecture to detect changes between image pairs . The parameters are 140 m , Relatively compact , Strike a balance between performance and model size , Suitable for mobile platforms , It is not easy to over fit in small data sets .

contribution ：

A deep deconvolution architecture is proposed , The performance of street scene change detection task is remarkable （ Better than manually designed descriptors ）, At the same time, the embedded device （1.4M Parameters ） Keep the appropriate lightweight .
Propose a new data set , Used for urban scene change detection , Contains challenging seasonal and lighting changes .
A multi-sensor fusion system is designed SLAM System , The system is combined with rapid and intensive reconstruction of pipelines , For approximate alignment of image pairs , To achieve change detection across time .

Process Overview

(a) Use multi-sensor fusion SLAM System processing t1 and t2 Video sequence of time , in consideration of GPS、 Inertial ranging and RGB Image data , To generate vehicle motion and sparse 3D reconstruction ;
(b) By approximate GPS Positioning and powerful feature matching and binding adjustment , Cross time registration of sequences ;
(c) A new slope smoother method is used to effectively densify the reconstruction ; Depth maps are used to re project （π） To align the image ;
(d) A deconvolution network is used to predict the alignment RGB Changes between images ;
(e) The predicted changes of the network are shown in red . Interference due to lighting and seasonal changes is handled correctly .

Network architecture （CDNet）

CDNet,4 Compressed blocks （contraction block） By CONV、BNORM、ReLU and max-pooling layers .4 Extents （expansion block） Each of them is guided by a solution pool （ From the corresponding contraction blcok Storage pooling metrics for ）、CONV、BNORM and ReLU layers . The last layer is a linear arithmetic unit , There's a softmax classifier . As a preprocessing step , For the input RGB The image is normalized in the channel direction .

constitute Contraction network Of 4 individual block Used to create rich representations ; constitute Expansion network Of 4 individual block Improve the location and division of change areas . The final change decision is made by a softmax Linear classifiers operate intensively on each pixel .

Every contraction block By a 7*7 The convolution layer consists of , Having a fixed number of 64 Features . Before nonlinear activation , The output is normalized in batch （batch normalization,BN）, To reduce the shift of internal covariates during training （internal covariate shift）, Improve convergence .BN There is no royalty statistics calculation for the parameter of , It is learned as an additional parameter . The nonlinearity is activated by a standard rectifier linear unit （Rectified Linear Units,ReLU） produce , And by the 2*2 Of max-pooling layer , In steps of 2, To reduce the spatial dimension . After this operation , The most responsive metrics are stored , So that later in the corresponding expansion block Use in , Perform a clean upsampling of the data .
Every expansion block First, use the non pooling layer to upsample its input . This layer uses previously stored indices to generate an upsampled version of the input , The activation of the edge position is preserved , And other high-frequency features . This operation is followed by a 7×7 Convolution of , There is a fixed number of 64 Features . Same as before , stay ReLU Before , Pre activation is processed using batch normalization BN. This stacking of expansion and contraction blocks makes the network structure completely symmetrical in the number of features .

notes ：
1. The two channel network training method is adopted
2.EXPANSION NETWORK in unpool The upper sampling layer parameters of are stored in maxpool Parameters in
3. The batch standardized parameters are merged into the network parameter group for optimization

Dense 3D Reconstruction