当前位置：网站首页>Read leastereo:hierarchical neural architecture search for deep stereo matching

Read leastereo:hierarchical neural architecture search for deep stereo matching

2022-06-10 04:45:00 【CV research Capriccio】

Address of thesis ：https://arxiv.org/pdf/2010.13501.pdf
Source code address ：https://github.com/XuelianCheng/LEAStereo

summary

Neural network structure search （NAS） The method has been applied in many neighborhoods , The basic idea is to make the model in the search space （ Such as different convolution kernel sizes ） Get the most suitable architecture for the task according to the set search strategy . The current stereo matching tasks are based on the artificial design of complex model structure ,NAS The method has not been widely used in this field . This paper presents an end-to-end hierarchical training algorithm NAS frame , The human model design knowledge is integrated into the neural structure search framework to obtain the model architecture for stereo matching task , The model follows the commonly used framework of stereo matching （ feature extraction 、 Cost body construction 、 Dense matching ）. Use NAS Algorithm to search Feature model And Match model , From different model architectures at the same time 、 Different feature map sizes 、 The optimal model structure is obtained in the search space composed of the output parallax map range .

Model architecture

Structure search space for matching tasks

In the field of deep learning , Segmentation and stereo matching tasks are commonly used Encoder-Decoder structure , But based on U-Net The structural method is difficult to train , The voxel based stereo matching method uses better inductive bias （ Human design model structure knowledge ）, Therefore, it has faster convergence speed and better performance , This method constructs the 3D Cost body , Then based on 3D The cost body gets the parallax map , It also brings a large amount of calculation , This makes it possible to NAS The process of searching the model structure in the algorithm brings challenges .
This paper presents a hierarchical search method based on two levels ：cell Cell level structure search 、 Network level structure search . In this work , The geometric knowledge of stereo matching is embedded in the process of model architecture search , The overall model mainly includes 4 Parts of ：

Used to generate 2D Of characteristic graphs Characteristic network ;
4D Matching cost body construction ;
For cost aggregation and matching cost calculation Match the network ;
Regression of the cost volume to parallax graph soft-argmax layer ;

Only the characteristic network and the matching network contain trainable parameters , Therefore, only for these two modules NAS Structure search , The main model structure is shown in Figure 2 Shown ;
Insert picture description here

cell Hierarchical search space

take NAS The core search unit is recorded as cell , Will a cell Is defined as a with N Totally connected directed acyclic graph with nodes ; Every cell contain 2 Input nodes 、1 Output nodes and 3 Intermediate nodes ; about $l$ layer , The output node is marked as $C_l$ , The input node of this layer is the output node of the previous two layers ( $C_{l-1}、C_{l-1}$ ), remember $O$ For operations in the search space (2D Convolution 、 Jump connection, etc ); In the process of structure search , Intermediate nodes $s^{(j)}$ Described as ：
$\boldsymbol{s}^{(j)}=\sum_{i \sim j} o^{(i, j)}\left(\boldsymbol{s}^{(i)}\right) \tag{1}$

among ： $\sim$ Express $i$ Node and $j$ Node connection , meanwhile ：
$o^{(i, j)}(\boldsymbol{x})=\sum_{r=1}^{\nu} \frac{\exp \left(\alpha_{r}^{(i, j)}\right)}{\sum_{s=1}^{\nu} \exp \left(\alpha_{s}^{(i, j)}\right)} o_{r}^{(i, j)}(\boldsymbol{x})\tag{2}$
among ： $o_{r}^{(i, j)}$ It is the... Between two nodes r Two candidate operations ; This layer is used to search the weight of each operation in the space （ Degree of confidence ） $\left(\alpha_{1}^{(i, j)}, \alpha_{2}^{(i, j)}, \cdots, \alpha_{\nu}^{(i, j)}\right)$ after softmax Function to get the normalized weight ; Finally, the final structure is formed by selecting the operation with the largest weight among all adjacent nodes . namely $o^{(i, j)}=o_{r^{*}}^{(i, j)} ; r^{*}=\arg \max _{r} \alpha_{r}^{(i, j)}$ ; In this process, only one kind of cell Search the structure of feature network and matching network , Changes in spatial resolution are handled by network level search . because DARTS Limitations of structure search methods , That is node $C_{l-2},C_{l-1},C_l$ Need to have the same space and channel dimensions . In order to deal with the difference of image resolution between adjacent cells , The mismatched feature map is adjusted to the same resolution by up sampling or down sampling ;

Residual unit

Insert picture description here
Previous studies have chosen to connect the outputs of all intermediate nodes in series to form a cell Output , This design is called a direct unit . suffer resNet Inspired by the , Add hop connections between adjacent nodes （ Pictured 3 The red line shows ）, This allows the network to learn residuals based on the original structure , The experimental results show that the residual structure can get better results ;

Candidate operation sets

Due to functional differences , Characteristic network and matching model Candidate operation sets Is different . For feature networks , The network aims to extract differentiated local features for pixel by pixel cost volume construction ; It is observed from experience that in DARTS The removal of the separable convolution and pooling layer of the expansion does not affect the performance of the model . So , Candidate operation set of feature model $O^F\in \{"3x3 \quad conv2D", "skip\quad connection"\}$ , A set of candidate operations for a matching network $O^M\in \{"3x3x3 \quad conv3D", "skip\quad connection"\}$

Network level search space

The network level search space is defined as cell Permutation , It controls the change of characteristic dimension and cell The flow of information between ; Pictured 3 On the right , The goal is in the predefined L Find an optimal path in the layer grid ; Consider each cell Medium filter Number , When the height and width of the characteristic tensor are halved, the channel is expanded to 2 times .
The network level search space has two super parameters ： Minimum resolution and number of layers L; The minimum resolution is set to Original size $\frac{1}{24}$ , Every level The lower sampling rate is {3, 2, 2, 2} , Finally, the size of the feature image with the minimum resolution is $\frac{1}{24}$ . At the initial position of the feature network, there are three layers “ stem ” structure , The first floor is stride=3 Of 3x3 Of Conv2d, The second and third floors are stride=1 Of 3x3 Of Conv2d. For the number of layers L , Set in the text $L^F=6 , L^M=12$ , It provides a good balance between computing load and network performance
It is similar to finding the best operation between nodes , Use a set of search parameters $\beta$ To search the grid , In order to find the path to minimize the loss in the grid .

Loss function

Use smooth L1 loss:
$\mathcal{L}=\ell\left(\mathbf{d}_{\text {pred }}-\mathbf{d}_{\mathrm{gt}}\right), \text { where } \ell(x)=\left\{\begin{array}{lc} 0.5 x^{2}, & |x|<1 \\ |x|-0.5, & \text { otherwise } \end{array}\right.\tag{3}$

After continuous relaxation , The weight of the network is optimized through double-layer optimization $w$ And architecture parameters $\alpha,\beta$ , We use... Separately $\alpha,\beta$ To parameterize cell Hierarchy And the structure of the network hierarchy . To speed up , Use first order approximation ; To avoid overfitting , Use two disjoint datasets $\text { I }$ And $\text { II }$ Alternate $w$ And $\alpha,\beta$ The optimization of the .

stay $\text { I }$ Data sets are passed through $\nabla_{\mathbf{w}} \mathcal{L}(\mathrm{w}, \boldsymbol{\alpha}, \boldsymbol{\beta})$ to update $w$
stay $\text { II }$ Data sets are passed through $\nabla_{\mathbf{\alpha, \beta}} \mathcal{L}(\mathrm{w}, \boldsymbol{\alpha}, \boldsymbol{\beta})$ to update $\alpha, \beta$
Optimization convergence , Keep the two operations with the highest weight between each node as one cell unit , The network structure is obtained by finding the path with the maximum probability .