当前位置:网站首页>Technical dry goods Shengsi mindspire innovation model EPP mvsnet high-precision and efficient 3D reconstruction
Technical dry goods Shengsi mindspire innovation model EPP mvsnet high-precision and efficient 3D reconstruction
2022-07-03 07:34:00 【Shengsi mindspire】
● background ●
Dense reconstruction (Multi-view Stereo, MVS) It is an image taken from multiple perspectives and camera pose , An algorithm for predicting image depth by pixel level dense matching through spatial geometric relationship , Widely used in AR/VR、 game 、 Survey and other fields . since MVSNet[1] Since it was proposed , Based on positive horizontal scanning (front-to-parallel)+ Differentiable monotonic deformation (differentiable homography) Build multi view pairs cost volume Of learning-based The method has proved its advantages on more and more public data sets . However, the prediction accuracy and efficiency of such methods will be greatly affected by the depth assumption , It is difficult to achieve a good balance between the amount of calculation and accuracy in the scene with a large depth range . Follow up CasMVSNet[2] Put forward coarse-to-fine Structural paradigm of , Predict the coarse-grained global depth range on the small-scale features , take refine The task is given to large-scale features .coarse-to-fine To a certain extent, the paradigm of controls the overall number of depth assumptions , Therefore, the depth prediction accuracy can be improved without increasing the amount of calculation , However, there is still not much discussion on the setting of depth hypothesis .
For the above problems , Huawei MindSpore The team proposed a high-precision and efficient dense reconstruction algorithm with reasonable settings for depth assumptions EPP-MVSNet.EPP-MVSNet Inherited coarse-to-fine thought , On this basis, for coarse Phase and fine The depth hypothesis of the stage puts forward the corresponding EAM(epipolar assembling module) and ER(entropy refining strategy) Module optimization . meanwhile , Through to 3D Regular network for simplification , The calculation efficiency of the whole model is further improved . end 2021 year 3 month ,EPP-MVSNet Public data sets in the field of dense reconstruction Tanks & Temples We got it intermediate Number one on the list ,advanced The fourth place on the list . Related papers have been published by ICCV2021 receive , The code is based on MindSpore Open source release , Welcome to use .
Thesis link :
https://openaccess.thecvf.com/content/ICCV2021/html/Ma_EPP-MVSNet_Epipolar-Assembling_Based_Depth_Prediction_for_Multi-View_Stereo_ICCV_2021_paper.html
Code link :
https://gitee.com/mindspore/models/tree/master/research/cv/eppmvsnet
● A term is used to explain ●
Depth hypothesis : In the depth prediction process , The pixel corresponds to the depth assumption plane where the 3D point may be located .
Depth interval : In the depth prediction process , Depth assumes the spacing between planes .
Epipolar line : A real three-dimensional point of an object , And master / The central line of the camera of the auxiliary view forms the polar plane , The intersection of polar plane and auxiliary view is called epipolar .
Main view : In the depth prediction process , Images requiring depth prediction .
Auxiliary view : In the depth prediction process , The image set with the highest degree of association with the main view .
Cost Volume: In the depth prediction process , The pixels in the main view are within the preset depth assumption , And the matching relationship between the corresponding sampling points on the auxiliary view .
● Algorithm is introduced ●
Learning-based coarse-to-fine The dense reconstruction method is mainly divided into the following steps : Two dimensional feature extraction 、cost volume structure 、3D Regular 、 Prediction of depth results .EPP-MVSNet Main innovation points of EAM And ER Modules focus on key steps cost volume structure , The algorithm pipeline is shown in the figure below .
Epipolar Assembling Module
The depth assumption is set as most coarse-to-fine The key step of dense reconstruction method , The settings in different stages are quite different . among ,coarse Several depth assumptions under the global depth range need to be set in the stage , At the same time, in order to control cost volume Size , At this stage, only a relatively small number of depth assumptions can be set . And that leads to this coarse The depth assumption interval of the stage is relatively large , Specifically reflected in the auxiliary view , The distribution of epipolar sampling points is relatively sparse , It is easy to miss key feature points .
In response to this problem ,EPP-MVSNet Put forward EAM(epipolar assembling module) modular , By calculating in advance the distribution of the original sampling points on the auxiliary view under the assumption of the default depth range , New sampling points are inserted adaptively according to their distribution intervals . Through this strategy ,EPP-MVSNet According to the distribution of different sampling points between the main view and different auxiliary views due to different spatial geometric relationships , Adaptively maintain the density of sampling points , Reduce the possibility of missing key feature points .
Adaptive interpolation increases the sampling density , But it inevitably leads to cost volume The linear growth of , At the same time, it may also be generated by the main view and different auxiliary views cost volume Of different shapes . In order to control cost volume Size ,EAM Adopted “ Deep convolution “ After interpolation cost volume Simple information extraction , Then choose different window sizes and step settings according to the interpolation situation max pooling Yes cost volume Down sampling , Change it back to the shape before interpolation . meanwhile ,EAM The dynamics of the pooling Will also cost volume Interpolation points on the depth dimension cost The information converges on the original point .
after EAM After the module processing cost volume, Still maintain the original shape , But every one of them cost voxel Both converge the information of adjacent interpolation sampling points , Its receptive field is much better than that before treatment , So as to make more accurate depth prediction .
Entropy Refining Strategy
EAM The module mainly solves coarse The depth assumption of the stage , And in the refine The depth assumption of the stage is generally centered on the depth prediction results of the above stages , Extend a certain interval to both sides as the depth assumption range . because refine Stage depth prediction is based on high-resolution features , Considering the computational efficiency, too many depth assumptions cannot be used . If the range of depth assumptions is uniformly reduced for the sake of prediction accuracy , It may lead to inaccurate depth prediction in the lead stage and exclude the true value ; If the range of depth assumption is relatively conservatively relaxed , It will also cause the depth interval to increase , As a result, the prediction accuracy decreases .
For the above problems ,EPP-MVSNet Put forward ER(entropy refining strategy) modular , The depth assumption range of the next stage can be adjusted adaptively according to the depth prediction of the current stage .ER The module utilizes “ entropy ” The nature of , It expresses the confidence of the model in the prediction results : The greater the entropy , The more distrustful the representative model is of the predicted results .
As shown above ,E Representative entropy ,k Represents the current stage ,M Represents the number of depth assumptions in the current stage ,P Represents the depth prediction probability ,p Represents the pixel position ,d Represents the corresponding depth assumption .
After calculating the entropy corresponding to each point on the depth map of the current stage , According to the above formula, obtain the depth assumption range of the next stage . among ,r Represents the depth range ,
It's a super parameter .
ER The module can adaptively determine the appropriate depth assumption range for the next stage according to the depth prediction of each stage , So as to further refine Depth prediction accuracy at the same time , Reduce the case that the truth value is incorrectly excluded from the depth assumption .
Light-weighted Network
EPP-MVSNet stay cost volume Reference on aggregation [3] Weighted aggregation is adopted , At the same time, in order to save computation , Only in coarse The stage generates visibility map, In the subsequent stage, the weights are reused by up sampling . In order to further improve the computational efficiency of the whole model ,EPP-MVSNet suffer [4] Inspired by the , use pseudo-3D Convolution replaces tradition 3D Convolution . To be specific ,EPP-MVSNet Of 3D Regular networks are all composed of 3*1*1 and 1*3*3 The convolution of , Dimensionally extract cost volume Information about .
With the above settings ,EPP-MVSNet The overall computing efficiency and memory usage have been further improved .
● experimental result ●
As shown in the figure above , Adopted EAM and ER Modular EPP-MVSNet stay Tanks & Temples intermediate Data sets go beyond the previous SOTA Method .
end 2021 year 3 month 18 Number ,EPP-MVSNet To obtain the Tanks & Temples intermediate Top of the list and advanced The fourth place on the list .
EAM The module effectively solves coarse The sampling sparsity problem caused by the relatively large depth interval under the assumption of stage global depth , With dynamic pooling The design controls the amount of calculation and improves the accuracy of depth prediction .ER The module is coarse-to-fine Method in refine The depth assumption setting of the stage provides a new idea ; Last , Thanks to lightweight network design and pseudo-3d The use of convolution ,EPP-MVSNet It is also better than most in running time SOTA Method , Second only to PatchmatchNet[5].
about EPP-MVSNet This is the introduction of , Welcome to discuss more , To criticize and correct .
We look forward to your participation MindSpore
MindSpore Official website :https://www.mindspore.cn/
MindSpore Forum :https://bbs.huaweicloud.com/forum/forum-1076-1.html
reference :
[1] Yao Y, Luo Z, Li S, et al. Mvsnet: Depth inference for unstructured multi-view stereo[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 767-783.
[2] LaroGu X, Fan Z, Zhu S, et al. Cascade cost volume for high-resolution multi-view stereo and stereo matching[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2495-2504.chelle H, Erhan D, Bengio Y. Zero-data learning of new tasks[C]//AAAI. 2008, 1(2): 3.
[3] https://arxiv.org/abs/2008.07928
[4] Qiu Z, Yao T, Mei T. Learning spatio-temporal representation with pseudo-3d residual networks[C]//proceedings of the IEEE International Conference on Computer Vision. 2017: 5533-5541.
[5] Wang F, Galliani S, Vogel C, et al. PatchmatchNet: Learned Multi-View Patchmatch Stereo[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 14194-14203.
MindSpore Official information
GitHub : https://github.com/mindspore-ai/mindspore
Gitee : https : //gitee.com/mindspore/mindspore
official QQ Group : 871543426
边栏推荐
- Arduino 软串口通信 的几点体会
- Es writing fragment process
- Use of file class
- Web router of vertx
- Introduction of transformation flow
- Vertx's responsive redis client
- Sent by mqtt client server of vertx
- Segment read
- [set theory] order relation (partial order relation | partial order set | example of partial order set)
- An overview of IfM Engage
猜你喜欢
C代码生产YUV420 planar格式文件
Collector in ES (percentile / base)
File operation serialization recursive copy
最全SQL与NoSQL优缺点对比
Inverted chain disk storage in Lucene (pfordelta)
項目經驗分享:實現一個昇思MindSpore 圖層 IR 融合優化 pass
图像识别与检测--笔记
Common problems in io streams
Comparison of advantages and disadvantages between most complete SQL and NoSQL
专题 | 同步 异步
随机推荐
[Development Notes] cloud app control on device based on smart cloud 4G adapter gc211
I. D3.js hello world
专题 | 同步 异步
Rabbit MQ message sending of vertx
Common methods of file class
Lucene merge document order
The difference between typescript let and VaR
Image recognition and detection -- Notes
项目经验分享:实现一个昇思MindSpore 图层 IR 融合优化 pass
Dora (discover offer request recognition) process of obtaining IP address
Win 2008 R2 crashed at the final installation stage
What did the DFS phase do
VMware virtual machine installation
7.2刷题两个
List exercises after class
IPv4 address
Vertx restful style web router
[set theory] order relation (partial order relation | partial order set | example of partial order set)
Es writing fragment process
Custom generic structure