当前位置:网站首页>[paper notes] street view change detection with deconvolutional networks
[paper notes] street view change detection with deconvolutional networks
2022-06-25 15:14:00 【m0_ sixty-one million eight hundred and ninety-nine thousand on】
The paper

Thesis title :Street-View Change Detection with Deconvolutional Networks
Address of thesis :https://www.researchgate.net/publication/304533064_Street-View_Change_Detection_with_Deconvolutional_Networks
primary coverage
Propose a system , It is used to detect the structural change of street view video taken by vehicle mounted monocular camera . Multi sensor fusion SLAM And fast and dense 3D Rebuild the pipe connections , The roughly registered image pair is provided to the depth deconvolution network , For pixel level change detection .
Put forward CDNet, An efficient method based on stack shrink and expansion block CNN Architecture to detect changes between image pairs . The parameters are 140 m , Relatively compact , Strike a balance between performance and model size , Suitable for mobile platforms , It is not easy to over fit in small data sets .
contribution :
- A deep deconvolution architecture is proposed , The performance of street scene change detection task is remarkable ( Better than manually designed descriptors ), At the same time, the embedded device (1.4M Parameters ) Keep the appropriate lightweight .
- Propose a new data set , Used for urban scene change detection , Contains challenging seasonal and lighting changes .
- A multi-sensor fusion system is designed SLAM System , The system is combined with rapid and intensive reconstruction of pipelines , For approximate alignment of image pairs , To achieve change detection across time .
Process Overview


- (a) Use multi-sensor fusion SLAM System processing t1 and t2 Video sequence of time , in consideration of GPS、 Inertial ranging and RGB Image data , To generate vehicle motion and sparse 3D reconstruction ;
- (b) By approximate GPS Positioning and powerful feature matching and binding adjustment , Cross time registration of sequences ;
- (c) A new slope smoother method is used to effectively densify the reconstruction ; Depth maps are used to re project (π) To align the image ;
- (d) A deconvolution network is used to predict the alignment RGB Changes between images ;
- (e) The predicted changes of the network are shown in red . Interference due to lighting and seasonal changes is handled correctly .
Network architecture (CDNet)

CDNet,4 Compressed blocks (contraction block) By CONV、BNORM、ReLU and max-pooling layers .4 Extents (expansion block) Each of them is guided by a solution pool ( From the corresponding contraction blcok Storage pooling metrics for )、CONV、BNORM and ReLU layers . The last layer is a linear arithmetic unit , There's a softmax classifier . As a preprocessing step , For the input RGB The image is normalized in the channel direction .
constitute Contraction network Of 4 individual block Used to create rich representations ; constitute Expansion network Of 4 individual block Improve the location and division of change areas . The final change decision is made by a softmax Linear classifiers operate intensively on each pixel .
- Every contraction block By a 7*7 The convolution layer consists of , Having a fixed number of 64 Features . Before nonlinear activation , The output is normalized in batch (batch normalization,BN), To reduce the shift of internal covariates during training (internal covariate shift), Improve convergence .BN There is no royalty statistics calculation for the parameter of , It is learned as an additional parameter . The nonlinearity is activated by a standard rectifier linear unit (Rectified Linear Units,ReLU) produce , And by the 2*2 Of max-pooling layer , In steps of 2, To reduce the spatial dimension . After this operation , The most responsive metrics are stored , So that later in the corresponding expansion block Use in , Perform a clean upsampling of the data .
- Every expansion block First, use the non pooling layer to upsample its input . This layer uses previously stored indices to generate an upsampled version of the input , The activation of the edge position is preserved , And other high-frequency features . This operation is followed by a 7×7 Convolution of , There is a fixed number of 64 Features . Same as before , stay ReLU Before , Pre activation is processed using batch normalization BN. This stacking of expansion and contraction blocks makes the network structure completely symmetrical in the number of features .
notes :
1. The two channel network training method is adopted
2.EXPANSION NETWORK in unpool The upper sampling layer parameters of are stored in maxpool Parameters in
3. The batch standardized parameters are merged into the network parameter group for optimization
Dense 3D Reconstruction

How to train :
Both convolution block and deconvolution block are initialized randomly . Using the default parameters Adam optimizer Training .
Faster convergence , stay 200epoch within , Every epoch 150 individual batches,batch The size is 10 To image .
Loss function : Weighted cross entropy , Select the weight according to the inverse frequency of the class in the training set
experiment

CL-CMU-CD dataset:

PCD dataset:

visualization :

边栏推荐
猜你喜欢

5 connection modes of QT signal slot

Mining procedure processing

How to make GIF animation online? Try this GIF online production tool

Js- get the mouse coordinates and follow them

Source code analysis of zeromq lockless queue

Some usage records about using pyqt5

Stack and queue

From 408 to independent proposition, 211 to postgraduate entrance examination of Guizhou University

搭建极简GB28181 网守和网关服务器,建立AI推理和3d服务场景,然后开源代码(一)

Several common optimization methods
随机推荐
QT opens the print dialog box in a text editor
Dynamic memory allocation
The robot is playing an old DOS based game
JS select all exercise
Learning notes on February 18, 2022 (C language)
3. Sequential structure multiple choice questions
How to download and install Weka package
Boost listening port server
Breakpad usage and DMP analysis
[C language] implementation of magic square array (the most complete)
System Verilog - thread
Core mode and immediate rendering mode of OpenGL
How to combine multiple motion graphs into a GIF? Generate GIF animation pictures in three steps
Common dynamic memory errors
Study notes of cmake
搭建极简GB28181 网守和网关服务器,建立AI推理和3d服务场景,然后开源代码(一)
Daily question, magic square simulation
High precision addition
(2) Relational database
Common classes in QT