当前位置：网站首页>Interpretation of transpose convolution theory (input-output size analysis)

Interpretation of transpose convolution theory (input-output size analysis)

2022-07-07 19:42:00 【Midnight rain】

Transposition convolution

Transpose convolution is an up sampling measure commonly used in the field of image segmentation , stay U-Net,FCN It's all in the world , Its main purpose is to sample the low resolution feature pattern to the resolution of the original image , To give the segmentation result of the original image . Transpose convolution, also known as deconvolution 、 Fractional step convolution , But in fact, transpose convolution is not the inverse process of convolution , It's just that the input and output correspond in shape , for instance , In steps of 2 The convolution of can reduce the input characteristic pattern to the original size 1/4, The transposed convolution can restore the reduced feature pattern to its original size , But the value at the corresponding position is not different .

The connection between convolution and transpose convolution

To introduce transpose convolution , First we need to review the principle of convolution , In general , Convolution operation is to establish a local connection between a rectangular region in the original graph , A value mapped to the output characteristic pattern , It's a region -> Individual mapping ¹. As shown in the following example ：
Insert picture description here
Of course, in this case, it is difficult to see how the operation of convolution can achieve inversion or negation （ The frequency domain method is shown in the following table ）, Therefore, we can express the operation of convolution in a more regular form of matrix multiplication , The input image $X$ And output feature pattern $Y$ Flattening , The elements of convolution kernel are embedded into the corresponding position of sparse matrix to form a weight matrix $W$ , Then convolution can be expressed as ²：
$Y = W X$
See the following figure for the specific process demonstration ³：
Insert picture description here

Accordingly, if we want to pass $X$ Come and ask for $Y$ , The very direct idea is to find the inverse of the weight matrix and then multiply it left , But the weight matrix is not a square matrix , So all we can do is build a left multiplication $Y$ The output of $X$ The same new matrix , The meet shape(W’)=shape(W)：
$X=W'^TY$

At the same time, we should ensure that the mapping relationship of this matrix is individual -> Regional , And the corresponding relationship between the front and back positions remains unchanged , The new matrix satisfying this condition can also exactly form a convolution kernel satisfying and $Y$ The need for convolution , The operation corresponding to matrix multiplication can be transformed into convolution operation . Thus transposed convolution is also called deconvolution , But in fact, it is not in matrix inversion , Or deconvolution , But the transpose of the weight matrix （ Not exactly , Because the values are different , Just the same shape ）.
Insert picture description here

Transpose convolution output shape analysis

The content of the previous summary is similar to the relevant analysis on the Internet , However, the analysis of the shape of transposed convolution input and output lacks a unified caliber , Often from the existing function or transpose convolution itself , There is no internal relationship between transpose convolution and convolution . therefore , This section will start from a theoretical point of view , Deduce the relationship between the input and output parameters of transpose convolution , So as to determine the calculation formula .

Convolution input and output parameters

First of all, for ordinary convolution process , The determination of input and output size is very simple , Satisfy the following formula ：
$W_2=\frac{W_1+2P_1-F_1}{S_1}+1\tag{1}$
This formula is very easy to understand , $W_1+2P$ Is the length of the filled picture , ${W_1+2P_1-F_1}$ Is the last position in the figure where a convolution kernel can be accommodated , $\frac{W_1+2P_1-F_1}{S_1}$ Calculated with $S$ When moving, how many times does the convolution kernel need to move to reach the last body position , Add the last body position to get the formula .

Transpose convolution input and output parameters

Since transpose convolution is also a kind of convolution , Then its input and output must also meet the following formula ：
$W_1=\frac{W_2+2P_2-F_2}{S_2}+1\tag{2}$
When we determine the process of forward convolution , We need to use the area before and after the transpose convolution -> The corresponding relationship between individuals to determine the parameters of transpose convolution .

The shape of convolution kernel will not change , In the forward convolution process and transpose convolution, the number and position of non-zero terms are the same , This is also the basis for both of them to be converted into convolution operation , so $F_2=F_1$ .
For step parameters , In the forward convolution process , The step size defines the input region of two adjacent convolution operations $X_i,X_{i+1}$ Overlapping area （ Width is $W - S$ ）, That is, when only the first line of convolution kernel is considered , The input values related to both outputs at the same time are only $W - S$ individual . Thus, in the process of transpose convolution , This means that two adjacent inputs are involved at the same time $Y_i,Y_{i+1}$ The output is only $W - S$ individual , To achieve this goal , We need to fill in between two adjacent inputs 0 value , Slow down the step of convolution movement . This is why transpose convolution is also called fractional step convolution , Because you need to add $S - 1$ A zero , To meet the requirements of input-output correspondence , The original one-step leap S Elements , Now? S Step over 1 Elements . namely $S_2=\frac{1}{S_1}$ . In this case of fractional step size , The previous input-output calculation formula is no longer applicable , Therefore, it is expressed as default $S_2=1$ , Add $S_1-1$ Zero value , The formula becomes ：
$W_1=\frac{W_2+2P_2-F_2+(W_2-1)(S-1)}{1}+1$
After determining the first two parameters , The relationship between filling values can be obtained by combining two formulas ：
$S_1(W_2-1)+F-2P_1=W_2+2P_2-F+(W_2-1)(S_1-1)+1\\ F-2P_1=2P_2-F+2\\ P_2=F-P_1-1$

The one-to-one correspondence of these three parameters ensures forward convolution 、 The correspondence of elements does not change in the process of transpose convolution , Readers can verify ⁴.

Transpose convolution input-output relationship

Usually, the convolution kernel size is also used when defining transpose convolution , step , Fill value , $F, S, P$ These three parameters , among $S$ It is actually the reciprocal of the real step , $S - 1$ Fill adjacent elements 0 The number of values . And here comes the dumbest question , Usually, when defining a transposed convolution , To emphasize that it is transposed convolution , We use its corresponding forward convolution parameter to define it . Then from the first two sections of convolution and transpose convolution parameters corresponding relationship , We can know that , The output size of transpose convolution is ：

$\begin{aligned} O&=S(W-1)+1+2(F-P-1)-F+1 \\ &= S(W-1)-2P+F \end {aligned}$

Complex situation

Actually, the formula is 1 Inaccurate , Because there is $\frac{W_1+2P_1-F_1}{S_1}$ The case of non integer , At this time, the default is to round down , That is to say $[NS,NS+a],a\in\{0,1,...,s-1\}$ Within the scope of $\frac{W_1+2P_1-F_1}{S_1}$ All correspond to output results of the same size , So in order to recover the size of this part , We need to calculate the number of omitted values ⁵
$a=({W_1+2P_1-F_1})mod(S)$
Then when calculating the transpose convolution padding Add extra $a$ Zero value , As for the supplement on the upper left side , It depends on the function used .

summary

All in all , The parameters that define transpose convolution are 4 individual , From the corresponding forward convolution , They are the steps of forward convolution 、 Convolution kernel size 、 Number of fillings , Excess , $S, F, P, a$ , With these parameters, the forward convolution input size, that is, the transposed convolution output size, can be obtained ：
$O = S (W - 1) - 2 P + F + a$