当前位置：网站首页>Cvpr2022 stereo matching of asymmetric resolution images

Cvpr2022 stereo matching of asymmetric resolution images

2022-06-26 23:23:00 【ScarLeTzzz】

CVPR2022- Stereo matching of asymmetric resolution images

0. Abstract

Research questions ： Stereo matching of a pair of images with different resolutions ;
Method ： Unsupervised learning 、 Feature metric consistency 、 Self reinforcing optimization strategy ;
verification ： In a variety of degraded Simulation data set And self collected Real data sets Experimental results on show that the proposed method is superior to the existing solutions .

1. Introduce

Research background and significance ： at present , By two （ Or more ） Long distance camera systems composed of lenses with different focal lengths are widely used in smart phones . Such systems usually generate a pair of... In one shot （ Or a group ） Images with different resolutions , This makes many ideal applications possible , for example Continuous optical zoom and Image quality enhancement . For these applications , The corresponding point estimation of stereo images with asymmetric resolution is a key step , Usually by Traditional symmetric stereo matching algorithm （ Such as SGM) and Sampling on image To carry out . However, this method is easily affected by the artifacts introduced by sampling on the image , This effect is more obvious when the upsampling range is large .
Contribution summary ：
- First use Unsupervised learning methods The corresponding points are estimated from the resolution asymmetric stereo pairs ;
- Realization Feature metric consistency , To avoid photometric inconsistencies due to unknown degradation ;
- A method to enhance the consistency of feature metrics by progressive loss updating Self reinforcing strategy ;
- stay Analog datasets and real datasets On , Compared with the comparison method, it has obvious performance improvement .
Introduction to research methods ：
- Unsupervised learning ： For the case of asymmetric resolution , Supervised methods require not only the true value of parallax but also high-quality degraded views as labels , It is also necessary to define the degenerate form to learn the network parameters , This makes it difficult to apply to various complex real-world systems . therefore , We turn to unsupervised learning .
- Feature metric consistency ： For unsupervised stereo matching , The most widely used assumption is photometric consistency .
  - Pictured 1(a) Shown , The corresponding pixels in the symmetric stereo pair have $I_{L}[p_{L}]=I_{R}[p_{R}]$ ;
  - Pictured 1(b) Shown , The corresponding pixels in the asymmetric stereo pair may not have the same intensity or color , This photometric inconsistency will bring new difficulties to the corresponding point learning ;
  - The existing solution is through super-resolution (SR) Technology will LR View reverts to HR View , However , The existing SR Methods are mostly degenerate specific , The actual degradation is different from the assumed degradation , Performance will decline ;
  - This paper puts forward the idea of The feature space Instead of imposing consistency between two views in image space , be called Feature metric consistency , Specific consistent features can be generated through the feature extractor , namely , chart 1(b) Medium $F_{L}[p_{L}]=F_{R}[p_{R}]$ . These features can then be used to formulate a feature to measure the loss , To avoid photometric inconsistencies .
    Feature metric consistency It was discovered through experiments . Although it is not the best to train the network with luminosity loss , But trained The feature extractor can extract consistent features .

Self reinforcing optimization strategy ： When the stereo matching network is optimized through the loss of feature measurement , The feature extractor is also optimized , It can further enhance the consistency of feature metrics . So , A self enhancement strategy is introduced to Iterative optimization Feature extractor . To be specific , We use the feature extractor learned from the previous stage to form a new feature metric loss in the current stage , The new feature metric loss is used in the next stage of network training to learn a new feature extractor , Iterative optimization in turn . such , This method is still effective even for large degradation .

2. Resolution asymmetric stereo matching method is introduced

The method flow chart of this paper ：

2.1 Photometric consistency learning

Align the stereo pair $I_{L}$ and $I_{r \uparrow}$ As input , Unsupervised stereo matching network $\Phi(\cdot ; \theta)$ The forecast is relative to the left view $I_{L}$ Parallax map $d_{L}=\Phi(I_{L},I_{r \uparrow} ; \theta)$ , Training based on the photometric consistency of the corresponding points , namely ：
$I_{L}[p_{L}]=I_{r \uparrow}[p_{r \uparrow}] \tag {1}$
If parallax $d_{L}[p_{L}]$ Get an accurate estimate , So the left picture $I_{L}[p_{L}]$ It can be seen from the $I_{r \uparrow}[p_{L}]$ Combined with parallax transform, we get , namely ：
$I_{r \uparrow \rightarrow L}[p_{L}]=I_{r \uparrow}[p_{L}-d_{L}[p_{L}]] \tag {2}$
therefore , The photometric loss can be determined by $I_{L}$ And its reconstruction results $I_{r \uparrow \rightarrow L}$ The error between , It is generally weighted $\alpha$ Of $L_{1}$ and $S S I M$ A combination of distances , namely ：
$\mathcal{L}_{pm}=\| I_{L}-I_{r \uparrow \rightarrow L} \|_{1}+\alpha (1-SSIM(I_{L},I_{r \uparrow \rightarrow L})) \tag {3}$

SSIM： Structural similarity index （Structural Similarity Index Measure）
First use photometric consistency loss $\mathcal{L}_{pm}$ Train an initial network $\Phi^{0}$ Including feature extraction network $\Phi^{0}_{F}$ And matching network $\Phi^{0}_{M}$

2.2 Feature measure consistency learning

Given a stereo pair $I_{L}$ and $I_{r \uparrow}$ , $\Phi (\cdot ;\theta _{F} )$ Extracted features $F_{L}=\Phi_{F} (I_{L};\theta _{F})$ and $F_{r \uparrow}=\Phi_{F} (I_{L};\theta _{F})$ , These two features are consistent on the corresponding points of asymmetric pixels , namely ：

$F_{L}[p_{L}]=F_{r \uparrow}[p_{r \uparrow}] \tag {4}$

Will feature $F_{L}$ and $F_{r \uparrow}$ Concatenated into a cost body , And use $\Phi_{M} (\cdot ; \theta _{M})$ Regularize , Return to the parallax map $d_{L}$ . In obtaining the basis $d_{L}$ Transformed left view $I_{r \uparrow \rightarrow L}$ , Using a feature extractor $\Phi_{F} (\cdot ;\theta _{F} )$ take $I_{L}$ and $I_{r \uparrow \rightarrow L}$ Project to feature space , obtain $F_{L}$ and $F_{r \uparrow \rightarrow L}=\Phi_{F} (I_{r \uparrow \rightarrow L} ;\theta _{F} )$ .
The characteristic measurement loss can be modeled after the photometric consistency loss $\mathcal{L}_{fm}$ ：

$\mathcal{L}_{fm}=\| F_{L}-F_{r \uparrow \rightarrow L} \|_{1}+\alpha (1-SSIM(F_{L},F_{r \uparrow \rightarrow L})) \tag {5}$

Then we use the feature to measure the consistency loss $\mathcal{L}_{fm}$ Retraining the network gets $\Phi^{1}$ Including feature extraction network $\Phi^{1}_{F}$ And matching network $\Phi^{1}_{M}$

2.3 Self reinforcing strategy

Pictured 3(b) Shown , Given a stereo dataset with asymmetric resolution .

use first $\mathcal{L}_{pm}$ Train a stereo matching network $\Phi (\cdot ;\theta^{0} _{F} ; \theta^{0} _{M})$ （ abbreviation $\Phi^{0}$ ), Its feature extractor $\Phi^{0}_{F}$ Form characteristics to measure loss $\mathcal{L}^{0}_{fm}$ .
And then use it $\mathcal{L}^{0}_{fm}$ A new stereo matching network is optimized $\Phi^{1}$ . stay $\Phi^{1}$ In the process of adjustment , Used to calculate $\mathcal{L}^{0}_{fm}$ The feature extractor is fixed .
After adjustment , Enhanced $\Phi^{1}_{F}$ It can also form better characteristics to measure the loss $\mathcal{L}^{1}_{fm}$ （ Will be used in the next step to enhance ）. Keep using $\mathcal{L}^{k-1}_{fm}$ To adjust $\Phi^{k}$ , among $\in 1,...,K$ .

Be careful , We are only on the Internet $\Phi^{k}$ Converge to $\mathcal{L}^{k-1}_{fm}$ Time to build new losses $\mathcal{L}^{k}_{fm}$ , Because frequently changing the lost space may make the training process unstable .
Through this self reinforcing strategy , A continuous optimization network with gradually enhanced feature metric consistency can be obtained

3. experiment

3.1 Experiments on simulated datasets

Data sets ：
- Middlebury and KITTI2015;Inria_SLFD and HCI
- Degradation mode ：
  - Double triple down sampling （BIC）
  - Isotropic Gaussian kernel down sampling （IG）
  - Anisotropic Gaussian kernel down sampling （AG）
  - Isotropic Gaussian kernel JPEG Compress down sampling （IG JPEG）
  - Anisotropic Gaussian kernel JPEG Compress down sampling （AG JPEG）
Evaluation indicators ：
- 3 Pixel error （3PE）, The error of all areas exceeds 3 Pixel and exceeds the true value 5% Percentage of outliers of size ;
- End point error （EPE）, The average absolute difference between the estimated parallax and the real parallax .
The method of comparison ：
- SGM
- SR Preprocessing +BaseNet
  - RCAN+BaseNet
  - DAN+BaseNet
- BsaeNet+ Other feature extractors
  - BaseNet+CL
  - BaseNet+AE

SR： Super resolution recovery , Unblinded SR Method RCAN, blind SR Method DAN
Other feature extractors ： To compare losses （Contrastive Loss,CL） Characteristic network of training , With an automatic encoder （Auto-Encoder,AE） As a feature network .
BaseNet They are all popular PSMNet. Use ADAM Solver optimizes the network （ $\beta 1=0.9,\beta 2=0.999$ ） We set the learning rate to 0.001. The smoothing constraint of parallax is realized by weighted smoothness loss , namely ：
$KaTeX parse error: \tag works only in display equations$
therefore , The total loss function for all learning based solutions can be written as ：
$KaTeX parse error: \tag works only in display equations$
In style , $\lambda$ It's the weighting factor , $\mathcal{L}_{pm/fm}$ Is the photometric loss of the first method , Or the corresponding characteristics of the second method and our method measure the loss . Phase in self enhancement strategy K The number is set to 3.

result
- Quantitative results
- Qualitative results

3.2 Experiments on real datasets

Data sets ： Asymmetric stereo pairs are manufactured by Huawei P30 Smartphone capture . The asymmetry factor is approximately equal to 3. After camera calibration and stereo correction , We captured... For indoor and outdoor scenes 30 For asymmetric stereo pairs . We divided them randomly 5 Yes as a test set , Others as training sets .
result ：

4. Limitations and conclusions

Limit ：
- In addition to resolution , There may also be other types of asymmetry （ Such as color and brightness ）. Whether other types of asymmetric problems can be solved directly by extending the proposed method is still an open question .
Conclusion ：
- This paper reveals that the main challenge of unsupervised correspondence estimation from resolution asymmetric stereo images is photometric inconsistency . To overcome this challenge , We have achieved... In an efficient way Feature metric consistency , And introduced a Self reinforcing strategy To enhance this consistency . It is verified by comprehensive experiments , Our method shows excellent performance in dealing with various degradation between two views in practice .

原网站

版权声明
本文为[ScarLeTzzz]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206262259295622.html