当前位置:网站首页>Read leastereo:hierarchical neural architecture search for deep stereo matching
Read leastereo:hierarchical neural architecture search for deep stereo matching
2022-06-10 04:45:00 【CV research Capriccio】
Address of thesis :https://arxiv.org/pdf/2010.13501.pdf
Source code address :https://github.com/XuelianCheng/LEAStereo
summary
Neural network structure search (NAS) The method has been applied in many neighborhoods , The basic idea is to make the model in the search space ( Such as different convolution kernel sizes ) Get the most suitable architecture for the task according to the set search strategy . The current stereo matching tasks are based on the artificial design of complex model structure ,NAS The method has not been widely used in this field . This paper presents an end-to-end hierarchical training algorithm NAS frame , The human model design knowledge is integrated into the neural structure search framework to obtain the model architecture for stereo matching task , The model follows the commonly used framework of stereo matching ( feature extraction 、 Cost body construction 、 Dense matching ). Use NAS Algorithm to search Feature model And Match model , From different model architectures at the same time 、 Different feature map sizes 、 The optimal model structure is obtained in the search space composed of the output parallax map range .
Model architecture
Structure search space for matching tasks
In the field of deep learning , Segmentation and stereo matching tasks are commonly used Encoder-Decoder structure , But based on U-Net The structural method is difficult to train , The voxel based stereo matching method uses better inductive bias ( Human design model structure knowledge ), Therefore, it has faster convergence speed and better performance , This method constructs the 3D Cost body , Then based on 3D The cost body gets the parallax map , It also brings a large amount of calculation , This makes it possible to NAS The process of searching the model structure in the algorithm brings challenges .
This paper presents a hierarchical search method based on two levels :cell Cell level structure search 、 Network level structure search . In this work , The geometric knowledge of stereo matching is embedded in the process of model architecture search , The overall model mainly includes 4 Parts of :
- Used to generate 2D Of characteristic graphs Characteristic network ;
- 4D Matching cost body construction ;
- For cost aggregation and matching cost calculation Match the network ;
- Regression of the cost volume to parallax graph soft-argmax layer ;
Only the characteristic network and the matching network contain trainable parameters , Therefore, only for these two modules NAS Structure search , The main model structure is shown in Figure 2 Shown ;
cell Hierarchical search space
take NAS The core search unit is recorded as cell , Will a cell Is defined as a with N Totally connected directed acyclic graph with nodes ; Every cell contain 2 Input nodes 、1 Output nodes and 3 Intermediate nodes ; about l l l layer , The output node is marked as C l C_l Cl , The input node of this layer is the output node of the previous two layers ( C l − 1 、 C l − 1 C_{l-1}、C_{l-1} Cl−1、Cl−1 ), remember O O O For operations in the search space (2D Convolution 、 Jump connection, etc ); In the process of structure search , Intermediate nodes s ( j ) s^{(j)} s(j) Described as :
s ( j ) = ∑ i ∼ j o ( i , j ) ( s ( i ) ) (1) \boldsymbol{s}^{(j)}=\sum_{i \sim j} o^{(i, j)}\left(\boldsymbol{s}^{(i)}\right) \tag{1} s(j)=i∼j∑o(i,j)(s(i))(1)
among : ∼ \sim ∼ Express i i i Node and j j j Node connection , meanwhile :
o ( i , j ) ( x ) = ∑ r = 1 ν exp ( α r ( i , j ) ) ∑ s = 1 ν exp ( α s ( i , j ) ) o r ( i , j ) ( x ) (2) o^{(i, j)}(\boldsymbol{x})=\sum_{r=1}^{\nu} \frac{\exp \left(\alpha_{r}^{(i, j)}\right)}{\sum_{s=1}^{\nu} \exp \left(\alpha_{s}^{(i, j)}\right)} o_{r}^{(i, j)}(\boldsymbol{x})\tag{2} o(i,j)(x)=r=1∑ν∑s=1νexp(αs(i,j))exp(αr(i,j))or(i,j)(x)(2)
among : o r ( i , j ) o_{r}^{(i, j)} or(i,j) It is the... Between two nodes r Two candidate operations ; This layer is used to search the weight of each operation in the space ( Degree of confidence ) ( α 1 ( i , j ) , α 2 ( i , j ) , ⋯ , α ν ( i , j ) ) \left(\alpha_{1}^{(i, j)}, \alpha_{2}^{(i, j)}, \cdots, \alpha_{\nu}^{(i, j)}\right) (α1(i,j),α2(i,j),⋯,αν(i,j)) after softmax Function to get the normalized weight ; Finally, the final structure is formed by selecting the operation with the largest weight among all adjacent nodes . namely o ( i , j ) = o r ∗ ( i , j ) ; r ∗ = arg max r α r ( i , j ) o^{(i, j)}=o_{r^{*}}^{(i, j)} ; r^{*}=\arg \max _{r} \alpha_{r}^{(i, j)} o(i,j)=or∗(i,j);r∗=argmaxrαr(i,j) ; In this process, only one kind of cell Search the structure of feature network and matching network , Changes in spatial resolution are handled by network level search . because DARTS Limitations of structure search methods , That is node C l − 2 , C l − 1 , C l C_{l-2},C_{l-1},C_l Cl−2,Cl−1,Cl Need to have the same space and channel dimensions . In order to deal with the difference of image resolution between adjacent cells , The mismatched feature map is adjusted to the same resolution by up sampling or down sampling ;
Residual unit

Previous studies have chosen to connect the outputs of all intermediate nodes in series to form a cell Output , This design is called a direct unit . suffer resNet Inspired by the , Add hop connections between adjacent nodes ( Pictured 3 The red line shows ), This allows the network to learn residuals based on the original structure , The experimental results show that the residual structure can get better results ;
Candidate operation sets
Due to functional differences , Characteristic network and matching model Candidate operation sets Is different . For feature networks , The network aims to extract differentiated local features for pixel by pixel cost volume construction ; It is observed from experience that in DARTS The removal of the separable convolution and pooling layer of the expansion does not affect the performance of the model . So , Candidate operation set of feature model O F ∈ { " 3 x 3 c o n v 2 D " , " s k i p c o n n e c t i o n " } O^F\in \{"3x3 \quad conv2D", "skip\quad connection"\} OF∈{ "3x3conv2D","skipconnection"}, A set of candidate operations for a matching network O M ∈ { " 3 x 3 x 3 c o n v 3 D " , " s k i p c o n n e c t i o n " } O^M\in \{"3x3x3 \quad conv3D", "skip\quad connection"\} OM∈{ "3x3x3conv3D","skipconnection"}
Network level search space
The network level search space is defined as cell Permutation , It controls the change of characteristic dimension and cell The flow of information between ; Pictured 3 On the right , The goal is in the predefined L Find an optimal path in the layer grid ; Consider each cell Medium filter Number , When the height and width of the characteristic tensor are halved, the channel is expanded to 2 times .
The network level search space has two super parameters : Minimum resolution and number of layers L; The minimum resolution is set to Original size 1 24 \frac{1}{24} 241 , Every level The lower sampling rate is {3, 2, 2, 2} , Finally, the size of the feature image with the minimum resolution is 1 24 \frac{1}{24} 241 . At the initial position of the feature network, there are three layers “ stem ” structure , The first floor is stride=3 Of 3x3 Of Conv2d, The second and third floors are stride=1 Of 3x3 Of Conv2d. For the number of layers L , Set in the text L F = 6 , L M = 12 L^F=6 , L^M=12 LF=6,LM=12, It provides a good balance between computing load and network performance
It is similar to finding the best operation between nodes , Use a set of search parameters β \beta β To search the grid , In order to find the path to minimize the loss in the grid .
Loss function
Use smooth L1 loss:
L = ℓ ( d pred − d g t ) , where ℓ ( x ) = { 0.5 x 2 , ∣ x ∣ < 1 ∣ x ∣ − 0.5 , otherwise (3) \mathcal{L}=\ell\left(\mathbf{d}_{\text {pred }}-\mathbf{d}_{\mathrm{gt}}\right), \text { where } \ell(x)=\left\{\begin{array}{lc} 0.5 x^{2}, & |x|<1 \\ |x|-0.5, & \text { otherwise } \end{array}\right.\tag{3} L=ℓ(dpred −dgt), where ℓ(x)={ 0.5x2,∣x∣−0.5,∣x∣<1 otherwise (3)
After continuous relaxation , The weight of the network is optimized through double-layer optimization w w w And architecture parameters α , β \alpha,\beta α,β , We use... Separately α , β \alpha,\beta α,β To parameterize cell Hierarchy And the structure of the network hierarchy . To speed up , Use first order approximation ; To avoid overfitting , Use two disjoint datasets t r a i n I train \text { I } train I And t r a i n II train \text { II } train II Alternate w w w And α , β \alpha,\beta α,β The optimization of the .
- stay t r a i n I train \text { I } train I Data sets are passed through ∇ w L ( w , α , β ) \nabla_{\mathbf{w}} \mathcal{L}(\mathrm{w}, \boldsymbol{\alpha}, \boldsymbol{\beta}) ∇wL(w,α,β) to update w w w
- stay t r a i n II train \text { II } train II Data sets are passed through ∇ α , β L ( w , α , β ) \nabla_{\mathbf{\alpha, \beta}} \mathcal{L}(\mathrm{w}, \boldsymbol{\alpha}, \boldsymbol{\beta}) ∇α,βL(w,α,β) to update α , β \alpha, \beta α,β
Optimization convergence , Keep the two operations with the highest weight between each node as one cell unit , The network structure is obtained by finding the path with the maximum probability .

experimental result




边栏推荐
- FCOS模型在使用mindspore实现时,梯度集中在0的情况
- 2022 mobile crane driver examination questions and online simulation examination
- 25. Bom Event
- Detailed explanation of tcp/ip protocol mechanism
- GUI programming student achievement management system
- 2022山东省安全员C证考试题库及答案
- midway的使用教程
- Error of clip operator in mindconvert model transformation
- MindSpore的nn.pad能否指定维度填充
- Mnemonic search + state compression leetcode four hundred and sixty-four
猜你喜欢

Byte order, object class

【Linux篇<Day20>】——一文入门 数据库 和 容器技术

【创新文档技术解决方案】上海道宁为您提供涵盖整个文档起草生命周期的产品——Litera,帮助用户创建质量更高的文档
![[joint search set] sympodial plants (number of connected blocks)](/img/47/0dd7c64568a2176640dcab835228c6.png)
[joint search set] sympodial plants (number of connected blocks)

2022 examination questions and online simulation examination for main principals of hazardous chemical business units

Execution strategy of application software efficiency test

元道通信通过注册:年营收16.25亿 业绩高度依赖中移动

JMeter testing TCP million connections

How to use ODX to describe diagnostic sessions and security levels

MindSpore【初学入门】教程在线运行时报错
随机推荐
【创新文档技术解决方案】上海道宁为您提供涵盖整个文档起草生命周期的产品——Litera,帮助用户创建质量更高的文档
Quic must see
In the introduction to beginners in the mindpole official website tutorial, I don't know much about map mapping when importing data
S series · time attribute of modified file
To download APK files under IIS, you need to configure the Mimi type, otherwise you cannot download them
Distributed data object: HyperTerminal 'global variable'
Process, time slice, concurrency and parallelism
Yuandao communication has passed the registration: its annual revenue is 1.625 billion yuan, and its performance is highly dependent on China Mobile
mindspore1.6conda安装gpu版本验证失败
- Oui. Net C # Foundation (7): Interface - Comment les gens interagissent avec les chats
OpenJudge NOI 1.13 14:求满足条件的3位数
ThreadLocal not yet? Come and have a look!
Proteus simulation stm32f103r6tx - external interrupt control LED on and off (cube mx+keil5+proteus)
Informatics Olympiad 1288: Triangle optimal path problem | openjudge noi 2.6 7625: triangle optimal path problem
城市/学校/专业,哪个在选大学时最重要? | 每日趣闻
Informatics Olympiad all in one 1287: minimum toll | openjudge noi 2.6 7614: minimum toll
[depth first search] toy snake: maze problem
Mindscore1.6conda installation GPU version verification failed
2022.5.28-----leetcode. one thousand and twenty-one
Pampy | powerful pattern matching tool