当前位置:网站首页>Interpretation of transpose convolution theory (input-output size analysis)

Interpretation of transpose convolution theory (input-output size analysis)

2022-07-07 19:42:00 Midnight rain

Transposition convolution

  Transpose convolution is an up sampling measure commonly used in the field of image segmentation , stay U-Net,FCN It's all in the world , Its main purpose is to sample the low resolution feature pattern to the resolution of the original image , To give the segmentation result of the original image . Transpose convolution, also known as deconvolution 、 Fractional step convolution , But in fact, transpose convolution is not the inverse process of convolution , It's just that the input and output correspond in shape , for instance , In steps of 2 The convolution of can reduce the input characteristic pattern to the original size 1/4, The transposed convolution can restore the reduced feature pattern to its original size , But the value at the corresponding position is not different .

The connection between convolution and transpose convolution

  To introduce transpose convolution , First we need to review the principle of convolution , In general , Convolution operation is to establish a local connection between a rectangular region in the original graph , A value mapped to the output characteristic pattern , It's a region -> Individual mapping 1. As shown in the following example :
 Insert picture description here
Of course, in this case, it is difficult to see how the operation of convolution can achieve inversion or negation ( The frequency domain method is shown in the following table ), Therefore, we can express the operation of convolution in a more regular form of matrix multiplication , The input image X X X And output feature pattern Y Y Y Flattening , The elements of convolution kernel are embedded into the corresponding position of sparse matrix to form a weight matrix W W W, Then convolution can be expressed as 2
Y = W X Y=WX Y=WX
See the following figure for the specific process demonstration 3
 Insert picture description here

Accordingly, if we want to pass X X X Come and ask for Y Y Y, The very direct idea is to find the inverse of the weight matrix and then multiply it left , But the weight matrix is not a square matrix , So all we can do is build a left multiplication Y Y Y The output of X X X The same new matrix , The meet shape(W’)=shape(W):
X = W ′ T Y X=W'^TY X=WTY

At the same time, we should ensure that the mapping relationship of this matrix is individual -> Regional , And the corresponding relationship between the front and back positions remains unchanged , The new matrix satisfying this condition can also exactly form a convolution kernel satisfying and Y Y Y The need for convolution , The operation corresponding to matrix multiplication can be transformed into convolution operation . Thus transposed convolution is also called deconvolution , But in fact, it is not in matrix inversion , Or deconvolution , But the transpose of the weight matrix ( Not exactly , Because the values are different , Just the same shape ).
 Insert picture description here

Transpose convolution output shape analysis

  The content of the previous summary is similar to the relevant analysis on the Internet , However, the analysis of the shape of transposed convolution input and output lacks a unified caliber , Often from the existing function or transpose convolution itself , There is no internal relationship between transpose convolution and convolution . therefore , This section will start from a theoretical point of view , Deduce the relationship between the input and output parameters of transpose convolution , So as to determine the calculation formula .

Convolution input and output parameters

  First of all, for ordinary convolution process , The determination of input and output size is very simple , Satisfy the following formula :
W 2 = W 1 + 2 P 1 − F 1 S 1 + 1 (1) W_2=\frac{W_1+2P_1-F_1}{S_1}+1\tag{1} W2=S1W1+2P1F1+1(1)
This formula is very easy to understand , W 1 + 2 P W_1+2P W1+2P Is the length of the filled picture , W 1 + 2 P 1 − F 1 {W_1+2P_1-F_1} W1+2P1F1 Is the last position in the figure where a convolution kernel can be accommodated , W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1F1 Calculated with S S S When moving, how many times does the convolution kernel need to move to reach the last body position , Add the last body position to get the formula .

Transpose convolution input and output parameters

  Since transpose convolution is also a kind of convolution , Then its input and output must also meet the following formula :
W 1 = W 2 + 2 P 2 − F 2 S 2 + 1 (2) W_1=\frac{W_2+2P_2-F_2}{S_2}+1\tag{2} W1=S2W2+2P2F2+1(2)
When we determine the process of forward convolution , We need to use the area before and after the transpose convolution -> The corresponding relationship between individuals to determine the parameters of transpose convolution .

  The shape of convolution kernel will not change , In the forward convolution process and transpose convolution, the number and position of non-zero terms are the same , This is also the basis for both of them to be converted into convolution operation , so F 2 = F 1 F_2=F_1 F2=F1.
  For step parameters , In the forward convolution process , The step size defines the input region of two adjacent convolution operations X i , X i + 1 X_i,X_{i+1} Xi,Xi+1 Overlapping area ( Width is W − S W-S WS), That is, when only the first line of convolution kernel is considered , The input values related to both outputs at the same time are only W − S W-S WS individual . Thus, in the process of transpose convolution , This means that two adjacent inputs are involved at the same time Y i , Y i + 1 Y_i,Y_{i+1} Yi,Yi+1 The output is only W − S W-S WS individual , To achieve this goal , We need to fill in between two adjacent inputs 0 value , Slow down the step of convolution movement . This is why transpose convolution is also called fractional step convolution , Because you need to add S − 1 S-1 S1 A zero , To meet the requirements of input-output correspondence , The original one-step leap S Elements , Now? S Step over 1 Elements . namely S 2 = 1 S 1 S_2=\frac{1}{S_1} S2=S11. In this case of fractional step size , The previous input-output calculation formula is no longer applicable , Therefore, it is expressed as default S 2 = 1 S_2=1 S2=1, Add S 1 − 1 S_1-1 S11 Zero value , The formula becomes :
W 1 = W 2 + 2 P 2 − F 2 + ( W 2 − 1 ) ( S − 1 ) 1 + 1 W_1=\frac{W_2+2P_2-F_2+(W_2-1)(S-1)}{1}+1 W1=1W2+2P2F2+(W21)(S1)+1
  After determining the first two parameters , The relationship between filling values can be obtained by combining two formulas :
S 1 ( W 2 − 1 ) + F − 2 P 1 = W 2 + 2 P 2 − F + ( W 2 − 1 ) ( S 1 − 1 ) + 1 F − 2 P 1 = 2 P 2 − F + 2 P 2 = F − P 1 − 1 S_1(W_2-1)+F-2P_1=W_2+2P_2-F+(W_2-1)(S_1-1)+1\\ F-2P_1=2P_2-F+2\\ P_2=F-P_1-1 S1(W21)+F2P1=W2+2P2F+(W21)(S11)+1F2P1=2P2F+2P2=FP11

The one-to-one correspondence of these three parameters ensures forward convolution 、 The correspondence of elements does not change in the process of transpose convolution , Readers can verify 4.

Transpose convolution input-output relationship

  Usually, the convolution kernel size is also used when defining transpose convolution , step , Fill value , F , S , P F,S,P F,S,P These three parameters , among S S S It is actually the reciprocal of the real step , S − 1 S-1 S1 Fill adjacent elements 0 The number of values . And here comes the dumbest question , Usually, when defining a transposed convolution , To emphasize that it is transposed convolution , We use its corresponding forward convolution parameter to define it . Then from the first two sections of convolution and transpose convolution parameters corresponding relationship , We can know that , The output size of transpose convolution is :

O = S ( W − 1 ) + 1 + 2 ( F − P − 1 ) − F + 1 = S ( W − 1 ) − 2 P + F \begin{aligned} O&=S(W-1)+1+2(F-P-1)-F+1 \\ &= S(W-1)-2P+F \end {aligned} O=S(W1)+1+2(FP1)F+1=S(W1)2P+F

Complex situation

  Actually, the formula is 1 Inaccurate , Because there is W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1F1 The case of non integer , At this time, the default is to round down , That is to say [ N S , N S + a ] , a ∈ { 0 , 1 , . . . , s − 1 } [NS,NS+a],a\in\{0,1,...,s-1\} [NS,NS+a],a{ 0,1,...,s1} Within the scope of W 1 + 2 P 1 − F 1 S 1 \frac{W_1+2P_1-F_1}{S_1} S1W1+2P1F1 All correspond to output results of the same size , So in order to recover the size of this part , We need to calculate the number of omitted values 5
a = ( W 1 + 2 P 1 − F 1 ) m o d ( S ) a=({W_1+2P_1-F_1})mod(S) a=(W1+2P1F1)mod(S)
Then when calculating the transpose convolution padding Add extra a a a Zero value , As for the supplement on the upper left side , It depends on the function used .

summary

  All in all , The parameters that define transpose convolution are 4 individual , From the corresponding forward convolution , They are the steps of forward convolution 、 Convolution kernel size 、 Number of fillings , Excess , S , F , P , a S,F,P,a S,F,P,a, With these parameters, the forward convolution input size, that is, the transposed convolution output size, can be obtained :
O = S ( W − 1 ) − 2 P + F + a O=S(W-1)-2P+F+a O=S(W1)2P+F+a

Reference resources


  1. Understand deconvolution in a text , Transposition convolution

  2. deconvolution (Transposed Convolution) Detailed derivation

  3. A Comprehensive Introduction to Different Types of Convolutions in Deep Learning

  4. ConvTranspose2d principle , How to upsample the deep network ?

  5. Super detailed description and derivation of the relationship between convolution and deconvolution ( Deconvolution is also called transpose convolution 、 Fractional step convolution )

原网站

版权声明
本文为[Midnight rain]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/188/202207071724083948.html