当前位置：网站首页>Ocr-gan [anomaly detection: Reconstruction Based]

Ocr-gan [anomaly detection: Reconstruction Based]

2022-07-28 22:43:00 【It's too simple】

Preface

The blog is from 2022.6CVPR A paper on , the papers with code Website statistics , The paper code is MVTec The running results on the dataset rank No 6（ As of the time of posting ）.

background

thought ： Reconstruction based approach , The lost and increased information of reconstruction is extracted by subtracting images with different frequencies ,Unet The network further extracts feature information , The intermediate attention mechanism should do a good job in information interaction between images and adaptive channel selection , Then the loss function is constructed through the ordinary convolution network to update the network and set the evaluation score . The model is to transform the image into a feature map in different dimensions through reconstruction , And through the Unet And attention mechanism to fully mix different feature map information .

The frequency domain mentioned in the paper / Frequency band and so on are just the same picture FD Pictures observed from different angles after module processing . In the module $I_{G}$ The deeper the partial processing , That is, the process from high frequency to low frequency , The more texture information is lost , The lost texture information is saved by image subtraction , That is, the pictures with different frequencies mentioned in the article . The higher the frequency is. , The more texture information the image contains , The lower the , The more spatial semantic information it contains .FD The module will be introduced below .

Model principle （ Source code ）

The source code only reproduces the picture FD become 2 The idea of this picture , The figure above shows the idea of expansion ,FD become n A picture , Pay attention to screening .（ps： When you look at the code, you only find FD After 2 A picture , Purring ）.

Let's understand , The titles correspond to the three blocks of the model （ Three groups of capital letters in the figure ）, The source code path corresponding to each block is also sorted out .

FD

train_all.py-->def train()-->data-->train_ds-->def FD(img)

Reset the image size of the input model to （256,256）, obtain $I_{}$ . Through Gaussian transformation , Discard even lines , Down sampling is achieved after even columns , Add next to each pixel 0 Value line ,0 Value column , The image is resized by Gaussian transformation （256,256） Realize up sampling , obtain $I_{G1}$ , $I_{}$ and $I_{G1}$ subtracting , obtain $I_{2}$ . $I_{1}$ yes $I_{}$ , such $I_{1}$ and $I_{2}$ Input to the next module together .

CS

train_all.py-->def train()-->model-->ocr_gan_aug-->self.netg-->UnetGenerator_CS-->unet_block

Unet The construction of the network （ too amzing 了！）

Because it is an imitative structure , So the setting of convolution block is not all unet The style of .

The main program （UnetGenerator_CS）： see Unet The network structure diagram is as above , The first line is the first unet block 1, There are five elements in total unet block （1～5）. The construction of the program is to unet The block starts from bottom to top according to the figure , Last use self.model finishing .

Subroutines （unet_block）： see forward, The model runs , from unet_block In the parameters of the submodel=unet, Run the blocks in the main program upside down , Restore to the running sequence of the structure diagram .（ A two-step , Because every time I run to submodel Will jump to the next unet block , This will make everyone unet The block only runs part , When running to the last unet When a block , Continue to jump back until the first unet block . First step ： Five structural blocks are convoluted twice to generate five feature layer blocks , The second step ： When you jump to the fourth time, you encounter the lowest block in the structure diagram 5, After one convolution, the feature layer block returns to the block 4, And block 4 Feature layer block fusion at , After one convolution, the block is returned 3, Repeat to block 1. Get the final feature map ）.

CS block

After the average pool of the whole picture , A feature layer of the image becomes a value , Reduce the number of channels through another full connection , Number of full connection recovery channels at a time , Join in softmax（） Function assigns probability to each pixel , Multiply , Complete module functions .

CS The block is responsible for the interaction between frequency diagrams （ This part is the pixel value of a feature layer × The probability value of another characteristic layer ）, And add attention mechanisms , Make adaptive selection for different channels .

summary ： the CS Handle , The two feature maps are added to form $\, \hat{I}$

D

The source code path of this part is very close to the above part , According to the structure in the figure , It's just a few convolution blocks .

As a discriminator , Output loss function , Help the model better distinguish abnormal images , because GAN The network cannot guarantee that the abnormal image cannot be completely reconstructed .