当前位置:网站首页>Affine transformation module and conditional batch Standardization (CBN) of detailed text generated images
Affine transformation module and conditional batch Standardization (CBN) of detailed text generated images
2022-07-27 04:56:00 【Medium coke with ice】
stay DF-GAN、SSA-GAN、RATGAN Wait for the model , They all use affine transformation construction condition batch normalization or semantic condition batch normalization to constrain the synthesis of images . This article will introduce in detail what is affine transformation , What is conditional batch normalization , How to fit the image distribution from the text distribution in the field of text generated image .
One 、 Affine transformation
1.1、 Concept
Affine transformation is also called affine mapping , As a commonly used transformation model in the field of image graphics , This paper mainly describes a linear transformation between two-dimensional coordinate point pairs , It can be seen as the superposition of linear transformation and translation transformation , It keeps the straightness and parallelism of two-dimensional graphics . Straightness means that the shape of the image remains unchanged after transformation , Parallelism means that the relative position between the transformed figures remains unchanged .
In graphic transformation , Affine transformation includes scaling ( scale )、 rotate 、 translation 、 shear ( The offset ), The transformation is as follows :
1.2、 Principles of Mathematics
There is no translation or the translation amount is 0 All affine transformations of can be called linear transformation , The linear transformation can be described by the following transformation matrix : [ x ′ y ′ ] = [ a b c d ] [ x y ] \left[\begin{array}{l} x^{\prime} \\ y^{\prime} \end{array}\right]=\left[\begin{array}{ll} a & b \\ c & d \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right] [x′y′]=[acbd][xy]
among , Corresponding to different transformations a,b,c,d Constraints are different , Look at the picture , For example, the constraints of scale transformation a Namely α, constraint d Namely β,b and c by 0, such x‘=αx,y’=βy Is to move the image along x The shaft shrinks and shrinks α times , Along the y The shaft shrinks and shrinks β times .
In order to cover Translation transformation , We need to add a dimension to the matrix , as follows : [ x ′ y ′ 1 ] = [ a b c d e f 0 0 1 ] [ x y 1 ] \left[\begin{array}{l} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right]=\left[\begin{array}{lll} a & b & c \\ d & e & f \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{l} x \\ y \\ 1 \end{array}\right] ⎣⎡x′y′1⎦⎤=⎣⎡ad0be0cf1⎦⎤⎣⎡xy1⎦⎤ The corresponding constraints are :a,b,c,d,e,f, With a 6 A degree of freedom , Of different basic transformations a,b,c,d,e,f Constraints are different . Translation transformation ,b=0,d=0,a=1,b=1,c=λ,f=θ, that x‘=x+λ,y‘=y+θ, Is to move the image along x Axis translation λ position , Put the image along y Axis translation θ position .
In order to make the image rotate , We added trigonometric functions
The final matrix transformation is defined as :
[ s cos ( θ ) − s sin ( θ ) t x s sin ( θ ) s cos ( θ ) t y 0 0 1 ] [ x y 1 ] = [ x ′ y ′ 1 ] \left[\begin{array}{ccc} s \cos (\theta) & -s \sin (\theta) & t_{x} \\ s \sin (\theta) & s \cos (\theta) & t_{y} \\ 0 & 0 & 1 \end{array}\right]\left[\begin{array}{l} x \\ y \\ 1 \end{array}\right]=\left[\begin{array}{l} x^{\prime} \\ y^{\prime} \\ 1 \end{array}\right] ⎣⎡scos(θ)ssin(θ)0−ssin(θ)scos(θ)0txty1⎦⎤⎣⎡xy1⎦⎤=⎣⎡x′y′1⎦⎤
The final affine transformation is the superposition of linear transformation and translation transformation .
1.3、 Mode of action
Affine transformation gives the picture zoom 、 rotate 、 translation 、 Geometric transformation functions such as offset .
If affine transformation acts on image sub regions or single pixel blocks , It is microscopic expansion and contraction 、 rotate 、 translation 、 The offset .
Two 、 Conditional batch standardization
2.1、 Batch of standardized
Batch of standardized (Batch Normalization,BN), Also called batch normalization , It is a technology used to improve the performance and stability of Artificial Neural Networks . This is a method to provide zero mean for any layer in the neural network / Unit variance input technique .
Standardization in batch (BN) in , The network will get the mean and variance of each training batch data , Then use the obtained mean and variance to normalize the training data of this batch . Because the normalized features will be basically limited to the normal distribution , It reduces the expression ability of the network , To solve this problem ,BN Affine transformation is introduced . Affine transformation through scale factor γ And offset factor β Adjust the normalized data .
The mathematical principle is as follows :
x ^ n c h w = x n c h w − μ c ( x ) σ c ( x ) , μ c ( x ) = 1 N H W Σ n , h , w x n c h w , σ c ( x ) = 1 N H W Σ n , h , w ( x n c h w − μ c ) 2 + ϵ , \begin{aligned} \hat{x}_{n c h w} &=\frac{x_{n c h w}-\mu_{c}(x)}{\sigma_{c}(x)}, \\ \mu_{c}(x) &=\frac{1}{N H W} \Sigma_{n, h, w} x_{n c h w}, \\ \sigma_{c}(x) &=\sqrt{\frac{1}{N H W} \Sigma_{n, h, w}\left(x_{n c h w}-\mu_{c}\right)^{2}+\epsilon,} \end{aligned} x^nchwμc(x)σc(x)=σc(x)xnchw−μc(x),=NHW1Σn,h,wxnchw,=NHW1Σn,h,w(xnchw−μc)2+ϵ,
x ~ n c h w = γ c x ^ n c h w + β c , \tilde{x}_{n c h w}=\gamma_{c} \hat{x}_{n c h w}+\beta_{c}, x~nchw=γcx^nchw+βc,
Give image features x ∈ R N × C × H × W x∈R^{N×C×H×W} x∈RN×C×H×W, First, it is normalized to the mean and variance of each characteristic channel , Second, learn the fixed transformation factor γ and β, Similar to affine transformation ,γ The scaling factor is equivalent to the linear transformation of affine transformation ,β The offset factor is equivalent to the translation transformation of affine transformation .
2.2、 Conditional batch standardization
But in BN in ,y and β It is a trainable parameter learned in the process of back propagation , Yes or no . therefore , In some conditional image generation tasks , Conditional batch standardization is widely used to replace BN, stay CBN in ,y and β It is obtained through additional neural network learning .CBN Combine the bottom information of the image with the condition information , Make the conditional information guide the feature expression of the image .
Scale factor γ And offset factor β Are derived from conditional information , In text generated images , This condition information is text e, And scale factor γ And offset factor β By MLP From learning :
x ~ n c h w = γ ( con ) x ^ n c h w + β ( con ) . \tilde{x}_{n c h w}=\gamma(\text { con }) \hat{x}_{n c h w}+\beta(\text { con }) . x~nchw=γ( con )x^nchw+β( con ).
γ c = P γ ( e ˉ ) , β c = P β ( e ˉ ) \gamma_{c}=P_{\gamma}(\bar{e}), \quad \beta_{c}=P_{\beta}(\bar{e}) γc=Pγ(eˉ),βc=Pβ(eˉ)
2.3、 Batch standardization of semantic conditions
If you don't add more spatial information , Then the semantic perception of the previous step BN Image feature mapping will be processed uniformly in space . But in text generated images , Actually It is expected that modulation only works on the text related part of the feature graph , Then there is the batch standardization of semantic conditions :
x ~ n c h w = m i , ( h , w ) ( γ c ( e ˉ ) x ^ n c h w + β c ( e ˉ ) ) \tilde{x}_{n c h w}=m_{i,(h, w)}\left(\gamma_{c}(\bar{e}) \hat{x}_{n c h w}+\beta_{c}(\bar{e})\right) x~nchw=mi,(h,w)(γc(eˉ)x^nchw+βc(eˉ))
among m Not only decide where to add text information , It also determines how much text information needs to be enhanced on the image feature map as a weight .
Partial reference to :https://blog.csdn.net/weixin_41006390/article/details/108029877
Last
Personal profile : Graduate students in the field of artificial intelligence , At present, I mainly focus on text generation and image generation (text to image) Direction
Pay attention to me : Medium coke with more ice
Time limited free subscribe : Text generated images T2I special column
Stand by me : give the thumbs-up + Collection ️+ Leaving a message.
If this article helps you a lot , I hope you can click below to reward me with a coke ! Add more ice
边栏推荐
- 详解左值、右值、左值引用以及右值引用
- Three cores of Redux
- Open the door of programming
- els_ Rectangle drawing, code planning and backup
- 在有序数组找具体某个数字
- STM32_ HAL_ SUMMARY_ NOTE
- Pinia入门到精通,Pinia使用全流程,包含state,actions,getters,以及如何解构,进行响应,actions使用的多种方法
- 5. Display of component dynamic components
- redux三大核心
- 关于gorm的BeforeDelete钩子方法不生效的问题
猜你喜欢
随机推荐
在有序数组找具体某个数字
How to do smooth data migration: Double write scheme
Pinia uses the whole process, including state, actions, getters, and how to deconstruct, respond, and various methods used by actions
Unity:Resource Merging、Static Batching、Dynamic Batching、GPU Instancing
Title: there is an array that has been sorted in ascending order. Now enter a number and ask to insert it into the array according to the original rule.
Pinia入门到精通,Pinia使用全流程,包含state,actions,getters,以及如何解构,进行响应,actions使用的多种方法
HCIA dynamic routing OSPF experiment
Dino paper accuracy, and analyze the variant of its model structure & Detr
Yolov4 network details
Text processing tool in shell, cut [option parameter] filename Description: the default separator is the built-in variable of tab, awk [option parameter] '/pattern1/{action1}filename and awk
Visualization domain svg
Interview must ask | what stages does a thread go through from creation to extinction?
冒泡排序(详细)
Comprehensive experiment of static routing
【C语言】动态内存管理
On the problem that Gorm's beforedelete hook method does not work
Ref Hook
【动态规划百题强化计划】11~20(持续更新中)
Technology sharing | gtid that needs to be configured carefully_ mode
「Photoshop2021入门教程」对齐与分布制作波点图案









