当前位置:网站首页>ICML 2022 𞓜 rethinking anomaly detection based on structured data: what kind of graph neural network do we need
ICML 2022 𞓜 rethinking anomaly detection based on structured data: what kind of graph neural network do we need
2022-06-11 19:52:00 【PaperWeekly】

author | Machine center editorial department
source | Almost Human
Figure neural network (GNN) It is widely used in anomaly detection of structured data , For example, malicious account detection on social networks 、 Financial transaction fraud detection, etc . Researchers from Hong Kong University of science and technology and Stanford University For the first time, from the perspective of spectral domain ( The spectral decomposition of graph Laplace matrix ) The possible impact of abnormal data is analyzed . The core findings are : Abnormal data will lead to spectral energy “ Move right ” The phenomenon , That is, the spectrum energy distribution moves from low frequency to high frequency . Based on this discovery , They put forward Beta Wavelet neural network (BWGNN). It has several band-pass filters with local properties , Better capture “ Move right ” High frequency abnormal information generated . On four large-scale graph anomaly detection data sets ,BWGNN The performance of the model is better than that of the existing models .

Paper title :
Rethinking Graph Neural Networks for Anomaly Detection
Thesis link :
https://arxiv.org/abs/2205.15508
Code link :
https://github.com/squareRoot3/Rethinking-Anomaly-Detection

Anomaly detection for structured graph data : Background and challenges
Anomaly detection is one of the classic tasks of data mining . Analyzing abnormal data helps enterprises or users understand the formation mechanism behind it , So as to make corresponding decisions , Avoid loss . With the development of the Internet , Anomaly detection for structured data , Image anomaly detection , Received more and more attention .
Figure anomaly detection can be specifically defined as : Find a few objects on the graph ( node 、 edge 、 Subgraph, etc ), They have different distribution rules from most other objects . This paper focuses on the task of detecting abnormal nodes on graphs . Compared with traditional anomaly detection methods , Graph anomaly detection can utilize the association information between different entities , Better serve network security 、 Fraud detection 、 Navy detection 、 Financial risk control 、 Actual scenarios such as fault monitoring .
The following figure visually compares the difference between traditional exception detection and graph oriented exception detection tasks .

▲ chart 1. Comparison between traditional anomaly detection and graph oriented anomaly detection
In recent years , Figure neural network has become a powerful tool for analyzing and processing structured data . The graph neural network learns the embedded representation including the characteristics of the node itself and the neighbor information , To better complete the classification 、 The reconstruction 、 Regression and other downstream tasks .
However , General graph neural network ( As shown in the figure, convolution network, etc ) It is mainly designed for normal data , It is easy to encounter on the task of anomaly detection “ Over smooth ”(over-smoothing) problem , That is, it is difficult to distinguish between abnormal nodes and normal nodes , Affect the accuracy of anomaly detection . for example , In the practical application of financial fraud detection , An abnormal account usually disguises itself by conducting normal transactions with multiple normal accounts , Reduce your suspicions , And then start illegal trading . such “ Relationship fraud ” It further increases the difficulty of graph anomaly detection .
In order to solve the above difficulties , The researchers specifically proposed Graph neural network model for anomaly detection task , Include 1) Using attention mechanism to aggregate neighborhood information from multiple views ;2) Using resampling method to aggregate different kinds of neighborhood information ;3) Design additional loss function to assist the training of graph neural network . These methods mainly design neural networks to deal with anomalies from the perspective of spatial domain , But no one has considered this problem from the perspective of the spectral domain .
The fact proved that , Choose a different spectrum filter (spectral filter) It will affect the expression ability of graph neural network , This leads to performance differences .

To open or find a new path or snap course : Graph anomaly detection from the perspective of spectral domain
In order to fill the gap of existing research , This article hopes to answer such a question : How to customize a spectrum filter for graph neural network for anomaly detection ?
This paper is the first attempt from Spectral domain perspective Analyze the abnormal data on the graph , And observed : Abnormal data will lead to spectral energy “ Move right ”, That is, the energy is less concentrated in the low frequency , At the same time, it focuses more on high frequency .
To visualize this shift to the right , The researchers first randomly generated a model with 500 Of nodes Barabási–Albert chart (BA chart ), It is assumed that the attributes of normal nodes and abnormal nodes on the graph follow two different Gaussian distributions , The variance of abnormal nodes is larger .
The top half of the picture shows the data containing different degrees of anomalies in BA The distribution on the graph , The lower part shows the corresponding spectrum energy distribution . among , The histogram represents the energy proportion of the corresponding spectrum interval , The line graph represents the cumulative percentage of frequency domain energy from zero to this point .

▲ chart 2. Spectrum energy “ Move right ” Visualization of phenomena
As can be seen from the above figure , When the proportion of abnormal data is 0% when , Most of the energy is concentrated in the low frequency part (λ<0.5). By increasing the variance of abnormal nodes relative to normal nodes σ Or proportion α, The abnormal degree of the data increases gradually , The energy in the low-frequency part of the spectrum gradually decreases , The high-frequency part is correspondingly increased . Based on Gaussian distribution assumption , this paper It is strictly proved theoretically that the increase of abnormal degree of data and the “ Move right ” There is a monotonic relationship between .
In the real world , Abnormal data usually follows a more complex distribution . On four large-scale graph anomaly detection data sets , The researchers also confirmed “ Move right ” The existence of phenomena . The following figure shows Amazon abnormal user detection data set as an example , After deleting some abnormal nodes in the data , The low-frequency energy on the spectrum increases significantly , At the same time, the high frequency is reduced accordingly . If you delete the same number of random nodes , The energy distribution of the spectrum hardly changes . This further verifies that the abnormal data is spectral energy “ Move right ” The key to .

▲ chart 3. The impact of deleting different nodes on the Amazon abnormal user detection data set on the spectrum energy distribution : Original picture (The Original), Delete random nodes (Drop-Random), Delete exception node (Drop-Anomaly)

Figure a new tool for anomaly detection :Beta Wavelet neural network
The analysis in the previous section shows that , We need to pay attention to... When detecting graph anomalies “ Move right ” effect . For example, the Amazon data set in the above figure , The eigenvalue λ=1 The nearby spectrum information is closely related to the abnormal data . To better capture exception information , Graph neural networks need to have the properties of bandpass filters , Only keep λ=1 Nearby signals filter the rest at the same time .
Unfortunately , Most of the existing graph neural networks belong to low-pass filter or adaptive filter , They do not guarantee bandpass properties . The adaptive filter has the ability to fit any function , But it may also degenerate into a low-pass filter in anomaly detection . This is because in the entire dataset , The proportion of high-frequency information corresponding to abnormal data is relatively small , Most of the spectrum energy is still concentrated in the low frequency .
In order to better handle the “ Move right ”, A new method of graph anomaly detection is proposed ——Beta Wavelet neural network (BWGNN). By learning from Hammond Graph wavelet theory , They are based on Beta Function to design a new wavelet kernel as the spectral filter of graph neural network .
Compared with the commonly used hot core (Heat Kernel) function ,Beta Function as wavelet kernel not only meets the requirements of bandpass filter , It also has better frequency domain locality and spatial domain locality . The following figure compares the thermal kernel wavelet with Beta The difference between kernel wavelets .

▲ chart 4. Thermonuclear wavelet and Beta Kernel wavelet in spectral domain ( Left ) And airspace ( Right ) The contrast on ,Beta Functions have better bandpass and local properties .
this paper It is verified on four large-scale graph anomaly detection data sets BWGNN Performance of . among ,Yelp Data set oriented comment website exception comment detection ,Amazon The data set is oriented to abnormal user detection of e-commerce platform ,T-Finance Data set oriented transaction network abnormal user detection ,T-Social The data set is used to detect abnormal users in social networks , Contains up to fivemillion nodes and 70million edges .
As can be seen from the table below , And the traditional classification model 、 Comparison between general graph neural network and special graph anomaly detection model ,BWGNN stay 40% Training data and 1% Training data ( Semi supervision ) Better results are achieved in both scenarios . In terms of operational efficiency ,BWGNN It is close to the time consumption of most general graph neural networks , More efficient than other graph anomaly detection models .



summary
In this paper , Researchers have found that the presence of abnormal nodes on the graph will lead to spectral energy “ Move right ”, It provides a new perspective for anomaly detection of structured data . Based on the discovery , This paper presents a new tool for graph anomaly detection ——Beta Wavelet neural network (BWGNN). It uses a specially designed band-pass filter to capture “ Move right ” High frequency abnormal information generated , The best results are achieved on multiple data sets .
In the actual landing , Graph anomaly detection is usually a complex system engineering , But choosing the right graph neural network is a key factor that affects the performance of the system . The researchers put forward BWGNN Design simplification 、 Low complexity 、 Easy to replace , It is a new choice of graph neural network .
边栏推荐
- Questions and requirements of marketing course design in autumn 21 of Dagong [standard answer]
- LDPC 7 - simple example of decoding
- 构建Web应用程序
- CMU 15-445 database course lesson 5 text version - buffer pool
- WR | effect of micro nano plastic pollution in Jufeng formation of West Lake University on microbial flora and nitrogen removal function of Constructed Wetland
- WinCC flexible 2008项目移植到博途WinCC的具体方法
- In 2021, the global revenue of flexible fireproof sealant is about 755.2 million dollars, and it is expected to reach 1211.7 million dollars in 2028
- 浅聊对比学习(Contrastive Learning)第一弹
- Image transformation torchvision Transforms notes
- 里氏替换原则
猜你喜欢

Implementation of SQL online editor based on Vue + codemirror

Hospital intelligent infusion management system source code hospital source code

RTL arbiter design

Yolov3 pytoch code and principle analysis (I): runthrough code

POJ 1458 longest common subsequence (dynamic planning exercise)

上海内推 | 上海期智研究院于洋老师课题组招收全职研究员

556. 下一个更大元素 III-(31. 下一个排列)-两次遍历

干货!基于序列超图神经网络的信息扩散预测

VS2010 cannot open when linking sql2008 database

Common - name of conference room
随机推荐
Template and requirements of curriculum design of reinforced concrete structure in autumn 21 of Dagong [standard answer]
vs2010链接sql2008数据库时无法打开
Leetcode 1992. Find all farm groups (yes, once)
无监督图像分类《SCAN:Learning to Classify Images without》代码分析笔记(1):simclr
In 2021, the global revenue of Indian fragrant rice was about $12530 million, and it is expected to reach $21670 million in 2028
Unsupervised image classification code analysis notes of scan:learning to classify images without (1): simclr
Hospital intelligent infusion management system source code hospital source code
Global Market Consulting - Research Report on the overall scale and application segmentation of the global HVDC converter station market in 2022
Xmake help 2
Anaconda installation, jupyter notebook default startup path modification and nbextensions plug-in installation
Proficient in xmake2
MySQL——事务
Edit the project steps to run QT and opencv in the clion
Two minutes to show you the charging standard of the Sub Ledger System
In 2021, the global revenue of flexible fireproof sealant is about 755.2 million dollars, and it is expected to reach 1211.7 million dollars in 2028
Operator new and placement new
Pyqt5 tips - button vertical display method, QT designer sets button vertical display.
基于飞桨搭建的多模态学习工具包PaddleMM
Lecture 30 linear algebra Lecture 2 Matrix
激活函数公式、导数、图像笔记