当前位置:网站首页>Transformer for anomaly detection - instra "painting transformer for anomaly detection"
Transformer for anomaly detection - instra "painting transformer for anomaly detection"
2022-07-28 19:26:00 【I'm Mr. rhubarb】
Original address
https://arxiv.org/pdf/2104.13897v1.pdf
Thesis reading methods
First time to know
GAN,AE This kind of anomaly detection method based on reconstruction , The disadvantage is that it is also very good for the reconstruction of abnormal samples , This will cause detection errors . At present, some methods turn the problem of generating refactoring into inpainting Problem for anomaly detection ,inpainting Is to cover some areas with images , And then recover , It can also be regarded as a self-monitoring method .
solve inpainting This kind of problem , Capturing long-distance semantic information from a larger area helps to reconstruct the coverage area . but CNN Due to the limitation of receptive field , It is not good at capturing long-distance information . therefore , The author was visually affected by the recent fire Transformer Inspired by the , So we use Transformer Architecture solves this problem . Here's the picture (a) Shown , During training , The image is cut into equal sized pieces , Use other image blocks in a large area to inpainting. chart (b) It shows the effect of reconstruction , And the abnormal score map obtained according to the pixel level error .

And the author only bases on MVTec AD A small number of samples of the data set itself are trained , Also achieved state-of-the-art The effect of .
Know each other
2. Related Work
Detect the current exception / Segmentation methods are mainly divided into two categories , One is the method based on refactoring , similar AE、GAN、VAE Other methods ; The second is based on embedding (Embedding) Methods , Mainly based on ImageNet Pre trained CNN Extract discriminant features for comparison .
Then I also introduced inpainting and transformer Some related methods .
3. Inpainting Transformer for Anomaly Detection
Use Transformer perform inpainting Mission Training . When testing , The same to inpainting The way to rebuild , Compare the difference between the input image and the reconstructed image , Get the test results .
3.1 Embedding Patches and Positions
Pictured above (a) Shown , The method of this paper is to choose a length of L The square area of ( Instead of ViT The whole image in ) Conduct inpainting, There are two position coding methods in the process , One is local coding , As shown on the left of the figure below , The other is global coding , As shown on the right of the figure below .

Why do you need these two coding modes , Intuitively , Texture image ( l ) There is no need to consider the global position information of the image block , Other categories are important ( r ).
and ViT The settings in are similar to , The location embedded information is D dimension , Map the image block to D Weihou , Add the two together . It should be noted that , There is an image block P ( t , u ) P(t,u) P(t,u) It is covered. . This article regards it as ViT Classification header in (class token):

Finally get L × L L\times L L×L The dimensions are D Sequence , Prepare to send into the follow-up Transformer.

3.2 Multihead Feature Self-Attention
The original MSA modular q And k It is maintained in D dimension , But the author's task is very similar between the image blocks of the training image , This causes the calculated attention weight to be almost equal . So the author is right Transformer The multi head attention module in has been slightly modified , In the calculation q And k when , utilize MLP Perform a nonlinear dimensionality reduction ( Set as D/2), It is called MFSA (multihead feature self-attention).
MLP The hidden dimension is 2D
D - 2D - D/2

Accelerate the convergence of the model and improve the accuracy , But this also increases the number of parameters
3.3 Network Architecture
Finally, the overall network architecture is as follows , On the left of the picture is a picture Transformer A module of , The input and output of each module are L 2 × D L^2\times D L2×D. On the last floor block The output is averaged (D), Then map as inpainting Result ( K 2 ∗ C K^2*C K2∗C).
You can also use the first output of the last layer for direct linear mapping , This is related to ViT similar .

4. Training
Randomly select a size of L The window of , Then select an image block in the window to cover , Then send the image blocks in the window together Transformer In the implementation of inpainting Mission .
The loss function is pixel level L2 loss, It also uses SSIM And GMS Two kinds of loss.
5. Inference and Anomaly Detection
First, calculate according to the difference between the reconstructed image and the original image pixel-level Abnormal score graph , Then choose the largest one as image-level Test score of .
Consistent with training , The test image is divided into NxM block , about (t,u) The image block on the position adopts the following formula to select its surrounding size as L The window of ,(r,s) Is the coordinate of the upper left corner of the window .

Make the image block in the center of the window as much as possible
Finally, for all NxM Image blocks inpainting, The reconstructed result of the whole image can be obtained . It is worth noting that , When the author is calculating the abnormal score graph , Instead of using L2 distance , Instead, it adopted GMS-based Methods , stay {1/2,1/4} Calculate the gradient amplitude similarity under the scale , Then the mean filter and Gaussian filter .

The anomaly maps at two scales are obtained respectively m1,m2, Restore it to the original image size , Calculate the pixel level mean of both as the abnormal score graph .
Further subtract the abnormal mean score graph in the training set from the abnormal score graph ( The training sets are all normal samples ) Square after ,T As the training set .

Finally, select the largest pixel score in the score graph as the final image level score .
4. Experiments
See the original text for specific results , Here are some implementation details
Details of the experiment :
Random selection from training images 10% As validation set , Used to control reconstruction results ( most 20 Zhang ). This epoch Inside , Each image will be randomly selected 600 Windows for training , Random rotation and overturning are used as augmentation means during training .
Image block size K=16, Window size is L=7, about MVTec Different types of data in also choose different image sizes ,{256x256,320x320,512x512}.Transformer The dimensions in the world D Set up 512. be-all resize The operation adopts bilinear interpolation .
The optimizer uses adam, however transformer The training time is relatively long , Sometimes more than 500 individual epoch. When validation sets loss exceed 50 individual epoch There is no obvious decline , Then stop training , And choose the best model to evaluate .
Looking back
In fact, the biggest innovation of the whole article is to Transformer Introduced into the field of anomaly detection , perform inpainting Mission . Designed for different situations local and global Two position embedding methods . Although it also introduces U-Net framework ( But ablation experiments have proved that it is not effective for all categories ), And modify MSA by MFSA, But they are all minor modifications , No big improvement .
In general ,inpainting Or based on refactoring . For some noisy areas, it is difficult to reconstruct , For example, character area . Besides , This method is still patch-based, This leads to a test sample that needs to perform multiple model reasoning to get the final abnormal score graph . And obvious boundary effect can be observed , As shown in the figure below :

Code
At present, the paper has no open source code
边栏推荐
- Pandownload revival tutorial
- 使用百度EasyDL实现明厨亮灶厨师帽识别
- Application of time series database in bridge monitoring field
- Jestson nano Object detection
- SaltStack入门
- Gmoea code operation 2 -- establishment and operation of operation environment
- [radar] radar signal online sorting based on kernel clustering with matlab code
- UWB module realizes personnel precise positioning, ultra wideband pulse technology scheme, and real-time centimeter level positioning application
- Application of TSDB in civil aircraft industry
- 身份证号的奥秘
猜你喜欢

How to write a JMeter script common to the test team

Cvpr21 unsupervised anomaly detection cutpaste:self supervised learning for anomaly detection and localization

Time waits for no man. The rise of TSDB is at the right time

Application of time series database in intelligent power consumption field

DevCon. Exe export output to the specified file

力扣 1331. 数组序号转换

Get to know nodejs for the first time (with cases)

SaltStack配置管理

2022年最火的十大测试工具,你掌握了几个

ACM warm-up exercise 3 in 2022 summer vacation (detailed)
随机推荐
VAE:变分自编码器的理解与实现
vim学习手册
Photoshop responsive web design tutorial
Application of time series database in monitoring operation and maintenance platform
Application of TSDB in civil aircraft industry
Prometheus部署
Application of time series database in Hydropower Station
Avoidance Adjusted Climbrate
Sudo rosdep init error: cannot download default
C language (high-level) character function and string function + Exercise
[image hiding] digital image information hiding system based on DCT, DWT, LHA, LSB, including various attacks and performance parameters, with matlab code
More loading in applets (i.e. list paging)
Powerbi time series analysis, prediction and visualization tutorial
Dockler的基础用法
Tikz draw Gantt chart in FJSP -trans necessary
使用百度EasyDL实现明厨亮灶厨师帽识别
Applet applet jump to official account page
From Bayesian filter to Kalman filter (I)
Server body 21: pre compilation processing by different compilers (a brief introduction to MSVC and GCC)
身份证号的奥秘