当前位置：网站首页>Affine transformation module and conditional batch Standardization (CBN) of detailed text generated images

Affine transformation module and conditional batch Standardization (CBN) of detailed text generated images

2022-07-27 04:56:00 【Medium coke with ice】

Catalog
One 、 Affine transformation
1.1、 Concept
1.2、 Principles of Mathematics
1.3、 Mode of action
Two 、 Conditional batch standardization
2.1、 Batch of standardized
2.2、 Conditional batch standardization
2.3、 Batch standardization of semantic conditions
Last

stay DF-GAN、SSA-GAN、RATGAN Wait for the model , They all use affine transformation construction condition batch normalization or semantic condition batch normalization to constrain the synthesis of images . This article will introduce in detail what is affine transformation , What is conditional batch normalization , How to fit the image distribution from the text distribution in the field of text generated image .

One 、 Affine transformation

1.1、 Concept

Affine transformation is also called affine mapping , As a commonly used transformation model in the field of image graphics , This paper mainly describes a linear transformation between two-dimensional coordinate point pairs , It can be seen as the superposition of linear transformation and translation transformation , It keeps the straightness and parallelism of two-dimensional graphics . Straightness means that the shape of the image remains unchanged after transformation , Parallelism means that the relative position between the transformed figures remains unchanged .

In graphic transformation , Affine transformation includes scaling （ scale ）、 rotate 、 translation 、 shear （ The offset ）, The transformation is as follows ：
Insert picture description here

1.2、 Principles of Mathematics

There is no translation or the translation amount is 0 All affine transformations of can be called linear transformation , The linear transformation can be described by the following transformation matrix ： $\left[\begin{array}{l} x^{\prime} \\ y^{\prime} \end{array}\right]=\left[\begin{array}{ll} a & b \\ c & d \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right]$

among , Corresponding to different transformations a,b,c,d Constraints are different , Look at the picture , For example, the constraints of scale transformation a Namely α, constraint d Namely β,b and c by 0, such x‘=αx,y’=βy Is to move the image along x The shaft shrinks and shrinks α times , Along the y The shaft shrinks and shrinks β times .

In order to cover Translation transformation , We need to add a dimension to the matrix , as follows ： $\left[\begin{array}{l} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right]=\left[\begin{array}{lll} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{l} x \\ y \\ 1 \end{array}\right]$ The corresponding constraints are :a,b,c,d,e,f, With a 6 A degree of freedom , Of different basic transformations a,b,c,d,e,f Constraints are different . Translation transformation ,b=0,d=0,a=1,b=1,c=λ,f=θ, that x‘=x+λ,y‘=y+θ, Is to move the image along x Axis translation λ position , Put the image along y Axis translation θ position .

In order to make the image rotate , We added trigonometric functions
The final matrix transformation is defined as ：
$\left[\begin{array}{ccc} s \cos (\theta) & -s \sin (\theta) & t_{x} \\ s \sin (\theta) & s \cos (\theta) & t_{y} \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{l} x \\ y \\ 1 \end{array}\right]=\left[\begin{array}{l} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right]$
The final affine transformation is the superposition of linear transformation and translation transformation .

1.3、 Mode of action

Affine transformation gives the picture zoom 、 rotate 、 translation 、 Geometric transformation functions such as offset .
Insert picture description here
If affine transformation acts on image sub regions or single pixel blocks , It is microscopic expansion and contraction 、 rotate 、 translation 、 The offset .

Two 、 Conditional batch standardization

2.1、 Batch of standardized

Batch of standardized （Batch Normalization,BN）, Also called batch normalization , It is a technology used to improve the performance and stability of Artificial Neural Networks . This is a method to provide zero mean for any layer in the neural network / Unit variance input technique .
Standardization in batch (BN) in , The network will get the mean and variance of each training batch data , Then use the obtained mean and variance to normalize the training data of this batch . Because the normalized features will be basically limited to the normal distribution , It reduces the expression ability of the network , To solve this problem ,BN Affine transformation is introduced . Affine transformation through scale factor γ And offset factor β Adjust the normalized data .

The mathematical principle is as follows ：

$\begin{aligned} \hat{x}_{n c h w} &=\frac{x_{n c h w}-\mu_{c}(x)}{\sigma_{c}(x)}, \\ \mu_{c}(x) &=\frac{1}{N H W} \Sigma_{n, h, w} x_{n c h w}, \\ \sigma_{c}(x) &=\sqrt{\frac{1}{N H W} \Sigma_{n, h, w}\left(x_{n c h w}-\mu_{c}\right)^{2}+\epsilon,} \end{aligned}$
$\tilde{x}_{n c h w}=\gamma_{c} \hat{x}_{n c h w}+\beta_{c},$

Give image features $x∈R^{N×C×H×W}$ , First, it is normalized to the mean and variance of each characteristic channel , Second, learn the fixed transformation factor γ and β, Similar to affine transformation ,γ The scaling factor is equivalent to the linear transformation of affine transformation ,β The offset factor is equivalent to the translation transformation of affine transformation .

2.2、 Conditional batch standardization

But in BN in ,y and β It is a trainable parameter learned in the process of back propagation , Yes or no . therefore , In some conditional image generation tasks , Conditional batch standardization is widely used to replace BN, stay CBN in ,y and β It is obtained through additional neural network learning .CBN Combine the bottom information of the image with the condition information , Make the conditional information guide the feature expression of the image .
Scale factor γ And offset factor β Are derived from conditional information , In text generated images , This condition information is text e, And scale factor γ And offset factor β By MLP From learning ：
$\tilde{x}_{n c h w}=\gamma(\text { con }) \hat{x}_{n c h w}+\beta(\text { con }) .$
$\gamma_{c}=P_{\gamma}(\bar{e}), \quad \beta_{c}=P_{\beta}(\bar{e})$
Insert picture description here

2.3、 Batch standardization of semantic conditions

If you don't add more spatial information , Then the semantic perception of the previous step BN Image feature mapping will be processed uniformly in space . But in text generated images , Actually It is expected that modulation only works on the text related part of the feature graph , Then there is the batch standardization of semantic conditions ：
$\tilde{x}_{n c h w}=m_{i,(h, w)}\left(\gamma_{c}(\bar{e}) \hat{x}_{n c h w}+\beta_{c}(\bar{e})\right)$
among m Not only decide where to add text information , It also determines how much text information needs to be enhanced on the image feature map as a weight .
Insert picture description here

Partial reference to ：https://blog.csdn.net/weixin_41006390/article/details/108029877

Last

Personal profile ： Graduate students in the field of artificial intelligence , At present, I mainly focus on text generation and image generation （text to image） Direction

Pay attention to me ： Medium coke with more ice

Time limited free subscribe ： Text generated images T2I special column

Stand by me ： give the thumbs-up + Collection ️+ Leaving a message.

If this article helps you a lot , I hope you can click below to reward me with a coke ！ Add more ice

原网站

版权声明
本文为[Medium coke with ice]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/208/202207262241048095.html