当前位置：网站首页>A unifying review of deep and shallow anomaly detection

A unifying review of deep and shallow anomaly detection

2022-07-01 15:16:00 【Yizhi WOW】

A Unifying Review of Deep and Shallow Anomaly Detection

A review of shallow models and deep learning models in anomaly detection
Abstract ： With many anomaly detection methods （ Based on the generation model , Single category , Reconstruction, etc ） Appearance , The methods in this field need to be summarized . In this review , The author aims to identify common principles and assumptions underlying various anomaly detection methods , And shows the classic of anomaly detection “ Shallow ” Algorithm and novel “ depth ” The connection between algorithms , Then the existing methods are empirically evaluated , Finally, the challenges and research directions in the field of anomaly detection are discussed .
Refer to the post ：
Overview of anomaly detection ：A Unifying Review of Deep and Shallow Anomaly Detection
link ：https://zhuanlan.zhihu.com/p/419163607
[ Paper reading ] Look at... In anomaly detection from a unified perspective shallow model And deep learning model
link ：https://zhuanlan.zhihu.com/p/457976793

1. background

（1） meeting / Publication level

Ruff L, Kauffmann J R, Vandermeulen R A, et al. A unifying review of deep and shallow anomaly detection[J]. Proceedings of the IEEE, 2021, 109(5): 756-795.
CCF A

（2） The author team

From the Berlin Institute of technology Lukas Ruff Doctor et al.
Insert picture description here

2. Introduction

2.1 What is an anomaly ？（ Definition ）

An anomaly is an observation that deviates greatly from the concept of normality . Also known as outliers or novelty , According to the circumstances , This observation may be called an anomaly 、 Irregular 、 Atypical 、 atypism 、 accident 、 rare 、 error 、 error 、 cheat 、 malice 、 Unnatural or strange .

2.2 Common anomaly detection algorithms

AD（ Or outlier detection or novelty detection ） Is to study through data-based methods 、 The research field of models and algorithms to detect such abnormal observations .AD Classic methods of include PCA【1】–【5】、OC-SVM【6】、SVDD【7】、 Nearest neighbor algorithm 【8】–【10】 and KDE【11】、【12】.
What the above methods have in common is , They are all unsupervised , This constitutes the AD Main methods . This is because , In standard AD Collection , The marked exception data usually does not exist . If available , It is usually not enough to fully describe all abnormal concepts . This usually invalidates the super fix method .
contrary ,AD A central idea of is to learn normality models from normal data in an unsupervised way , In order to detect anomalies through deviations from the model .

2.3 Study history

AD Our research has a long history , Across multiple disciplines , Including engineering 、 machine learning 、 Data mining and statistics . Although the so-called “ Inconsistent observation ” The first formal definition of can be traced back to 19 century [13], but AD The problem may have been studied informally earlier , Because abnormal phenomena occur naturally in different disciplines such as medicine and Natural Science . Abnormal data may be useless , for example , When caused by measurement error , Or it may be data with a huge amount of information , And grasp the key to new insights , For example, cancer patients who survive for a long time .

2.4 Anomaly detection applications

Now ,AD There are a large number of applications in various fields . for example ：

Intrusion detection in the field of network security 【15】–【20】
Finance 、 insurance 、 Fraud detection in healthcare and Telecommunications 【21】–【27】
Industrial fault and damage detection 【28】–【36】
Infrastructure monitoring 【37】【38】
Stock market monitoring 【39】、【40】
Acoustic novelty detection 【41】–【45】
Medical diagnosis 【46】–【60】
Disease outbreak detection 【61】、【62】
Event detection in Earth Science 【63】–【68】
And scientific discoveries in Chemistry 【69】–【70】
Bioinformatics 【71】
genetics 【72】–【73】
physics 【74】–【75】
Astronomy 【76】–【79】

The scale of data available in these areas is growing . It also extends to include complex data types , Such as images 、 video 、 Audio 、 Text 、 graphics 、 Multivariable time series and biological sequences . To make applications succeed in such complex high-dimensional data , Meaningful representation of data is crucial 【80】.

2.5 Deep learning method in anomaly detection

Deep learning （Deep learning）[81]–[83] Follow multiple layers of flexibility through training （“Deep”） Neural networks learn the idea of effective representation from the data itself , It greatly improves the technical level of many applications involving complex data types . Deep neural network is computer vision [84]–[93]、 speech recognition [94]–[103] Or natural language processing [104]–[113] Many tasks in fields such as provide the most successful solutions , And made contributions to science [114]–[123]. The method based on deep neural network can be represented by multi-layer distributed features , Utilize the inherent hierarchy or potential structure of data .
Parallel computing 、SGD Advances in optimization and automatic differentiation have made it possible to apply deep learning on a large scale using large data sets .
lately , People focus on development AD Deep learning methods have generated great interest . This is due to the lack of... For complex data AD An effective method of task , for example , Detect cancer from the billions of pixels of the whole slide image in histopathology 【124】. Like other applications of deep learning , depth AD Our goal is to reduce the burden of manual feature Engineering , And achieve effective 、 Scalable solutions .
However , Unlike supervised deep learning , depth AD It is not clear what the useful representation learning goal is , Because the essence of this problem is mostly unsupervised .

Deep anomaly detection methods include ：deep AE variants,deep one-class classification, be based on DGMs Like ：GANs.
Tradition AD In the method , The feature representation is a priori fixed （ for example , Through kernel feature mapping ）, The depth method can learn the feature mapping of data .

However , stay AD In the overall context of the study , Comprehensive treatment of deep learning methods , Especially its kernel based learning part 【6】、【7】、【183】—— Still missing .

In this review , The author's The purpose is to fill this gap by proposing a unified view , This view links the traditional shallow learning method with the novel deep learning method .
Insert picture description here

3. An introduction to anomaly detection

3.1 The importance of anomaly detection

（1） Anomaly detection is a part of our daily life , Such as credit card payment 、 The account login 、 Abnormal detection of the company's internal network . Anomaly detection algorithm is very important for many applications and services today , Even a little improvement can lead to huge economic benefits .

（2） The ability of anomaly detection plays an important role in ensuring the security and robustness of the system based on deep learning , Such as medical system and auto drive system .

3.2 The challenge of anomaly detection

1） The variance of normal samples may be large , This will cause normal samples to be recognized as abnormal （type I error), Or the abnormal samples cannot be identified （type II error). To reduce variance , Preprocessing 、 Standardization 、 Feature selection is an essential step .

（2） There are usually few abnormal points , This leads to the imbalance of training data . Even in most cases , Data is not tagged .

（3） There may be many kinds of outliers , It is difficult to identify them through a model . therefore , The model is generally to learn the distribution of normal samples , Judge samples different from normal samples as abnormal . However , If the distribution of normal samples changes constantly , This idea will fail .

Anomaly detection can find new , Unknown mode , Thus providing a new insights and hypotheses.

3.3 Formal Definition of Anomaly Detection

This section defines outliers from the perspective of probability , Describe the type of outliers , Distinguish the nuances of the three concepts ：an anomaly, an outlier, a novelty etc. .
Put forward the basic principle of anomaly detection ： Concentration assumption , Give a mathematical formula .

（1）what is an anomaly?

Definition ： Anomaly is an observation that deviates seriously from the concept of normality
Defined probability form ：
Insert picture description here

（2）types of anomalies

Point anomaly
Conditions / Context exception
Collective anomaly

Exception type link ：https://editor.csdn.net/md/?articleId=124472597

（3）Anomaly（ Outliers ）, Outlier（ outliers ）, or Novelty（ Novelty ）?

Insert picture description here

（4）Concentration Assumption（ Concentration assumption ）

Insert picture description here

（5）Density Level Set Estimation（ Density level estimation ）

Insert picture description here

（6）Density Estimation for Level Set Estimation

Density Level Set Estimation（ Density level estimation ） A common method of ： Density estimation
The core idea of this method ： Use the given density estimation method （ A parameterized , Nonparametric method ）, Select alpha loci for classification .

（7）Threshold Versus Score

Insert picture description here

（8）Selecting a Level α

α The choice of depends on the specific scenario . When α When the value increases , Anomaly detectors pay more attention to distribution α The most likely area in , Thus, more samples are identified as abnormal , This is useful in medical diagnosis and fraud detection , Because missing an exception will pay a huge price . On the other hand , Bigger α Will lead to greater false positive rate , It is unacceptable in some situations , Such as monitoring tasks, Because a large amount of data is constantly generated .α The choice of also depends on the distribution α Potential assumptions , It will be discussed in other chapters .

3.4 Dataset settings and dataset attributes

The data set settings It refers to unsupervised 、 Semi supervision 、 Weak supervision mode .data properties Refers to the data type or dimension . In different realistic scenes , They are very different . In this section, we introduce ：（1） Distribution of abnormal samples ;（2） Unsupervised ;（3） Semi supervision ;（3） supervise ;（5） The nature of the data .

（1）Distribution of Anomalies

（2）Unsupervised Setting

（3）Semisupervised Setting

（4）Supervised Setting

（5）Further Data Properties

https://zhuanlan.zhihu.com/p/419163607

3.5 Challenges in Anomaly Detection

This section mainly puts forward some noteworthy challenges in anomaly detection .

（1） Most anomaly detection is unsupervised , Therefore, this part of anomaly detection is the same challenge as unsupervised sharing .

（2） How to design a anomaly score or threshold.

（3） Parameters [ The formula ] How to be in false alarms and miss anomalies Strike a balance in .

（4） Consider the noise and contamination Impact on model robustness .

（5） The influence of the nature of data on modeling .

（6） Whether the data generation process is static ？ Whether the distribution of data has changed in the test phase ？

（7） For high-dimensional data , How to solve the problem of dimension disaster ？ How to choose the characteristics related to the task ？

（8） The challenges of deep anomaly detection are ： The increase of hyperparameters , Frame selection , Optimize the setting of parameters .

（9） When data or models become more and more complex , The interpretability of the model has become an important problem .

4. Density Estimation and Probabilistic Models（ Density estimation and probability model ）

The first method introduced by the author predicts anomalies by estimating the probability distribution of normal data .
therefore , The richness of existing probability models is AD An obvious candidate for the task . This includes classical density estimation methods [252] And depth statistical model . below , We will introduce these technologies to AD The adaptability of .

4.1 Classic Density Estimation

Laurikkala et.al The anomaly is measured by the Mahalanobis distance from the test sample to the mean value of the training sample , This is one of the most basic methods .Jain et.al Use training data to fit multivariate Gaussian distribution , Evaluate the likelihood value of the test sample according to the model , This method can capture the interaction between dimensions . To deal with more complex distributions , A nonparametric density estimation model is used , Such as nuclear density estimation KDE, Histogram estimation , Gaussian mixture distribution GMMs. Compared with histogram estimation ,KDE It has advantages in theory . Compared with GMMs,KDE It has advantages in parameter selection and fitting ,KDE It is the most popular nonparametric model . This kind of classical nonparametric model performs quite well in low dimensional data , But it can't handle high-dimensional data . The depth statistical model can overcome this limitation .

4.2 Energy-Based Models

4.3 Neural Generative Models (VAEs and GANs)

4.4 Normalizing Flows（ Normalized flow ）

5. One-Class Classification（ A classification ）

One-class classification, Also called single-class classification, They aim to directly learn a decision boundary , And the distribution of normal samples P+ Of density level set Corresponding , So as to minimize the error .

5.1 The goal of a classification

The goal of this kind of method is to learn a decision boundary , Reduce two ratios ：（1） false alarm rate: Identify normal samples as abnormal samples .（2）miss rate: Identify abnormal samples as normal samples . It's easy to achieve the first goal , Just learn a boundary that surrounds all the samples . But in that case , Normal samples and abnormal samples are together , Indistinguishable . therefore , This boundary should exclude abnormal samples from the boundary as far as possible . Generally speaking , Will set a priori false alarm rate, Go to excellent again miss rate.

5.2 One-Class Classification in Input Space

5.3 Kernel-Based One-Class Classification

5.4 Deep One-Class Classification

5.5 Negative Example

One-class The classifier can take advantage of the labeled negative samples . These negative samples can accelerate the detection of miss rate Empirical estimates . We have summarized three kinds of negative examples, Namely artificial, auxiliary, true Negative sample .

（1）Artificial

The idea of using synthetic data to solve unsupervised problems has been around for some time . If we assume that from the abnormal distribution [ The formula ] Data generated in , Then we can simply train a binary classification model to distinguish normal samples from synthetic abnormal samples . If [ The formula ] Subject to uniform distribution , The effect is ok . But this uniformly distributed sampling method becomes invalid in high-dimensional space . therefore , Many other sampling methods have been proposed , Such as resampling plan 、 Manifold sampling 、 Sampling based on local density estimation , Active learning strategies

（2）Auxiliary

lately , Some work uses some publicly available data as auxiliary The negative sample of , For example, in computer vision tasks, images from image sharing websites are used ,NLP Use in the task from English Wikipedia Corpus of . these auxiliary Samples provide more domain knowledge . This method is called Outlier Exposure, It can significantly improve the performance of anomaly detection on some tasks .

（3）True

The tag with the most information must be a real exception sample , For example, abnormal samples marked by domain experts . Some work shows , Even if there are only a few labels , It can also significantly improve the performance of the model . There are also some work proposed active learning Algorithm , Learn outliers with specific user information through user's subjective feedback .

6. Reconstruction Models

In the field of anomaly detection , The earliest neural network methods are those aiming at reconstruction . The refactoring based approach aims to learn a model , It can well reconstruct normal samples , At the same time, the samples that cannot be reconstructed accurately are judged as abnormal samples . Most of these methods have purely geometric motivation, such as PCA、deterministic AEs.

This section aims to define the learning objectives of such methods , Underlying assumptions , Design of abnormal scores .

6.1 Refactoring goals

Insert picture description here

Reconstruction Anomaly Score（ Reconstruct the abnormal score ）

6.2 Principal Component Analysis

6.3 Autoencoders（ Self encoder ）

6.4 Prototypical Clustering（ Prototype clustering ）

with prototype assumption The clustering method of is a kind of anomaly detection based on reconstruction . In this method , Reconstruction error is the distance from the sample point to it prototype Distance of . Commonly used in this kind of method is VQ Algorithm ,k-means,k-medians, and k-medoids. They are in metric space （ Usually input space ） It defines Voronoi partitioning. There is also research on anomaly detection k-means A variant of the kernel method class of , Gaussian mixture distribution .

7. Unifying View of Anomaly Detection

This section mainly proposes a unified framework for many anomaly detection methods , Promote the migration of algorithm ideas between existing anomaly detection methods , So as to find more research directions .

7.1 Modeling Dimensions of the AD Problem

We split the anomaly detection model into five dimensions ：
Insert picture description here

Loss
The loss function of semi supervised and supervised methods contains labels . The unsupervised loss function does not contain labels .
Model
The model maps a sample to a real value , The real value participates in the evaluation of losses . Because anomaly detection and density estimation are closely related , So many methods have equivalent likelihood models .
Feature Map
It's part of the model , Responsible for projecting the original data space into a new feature space . In nuclear methods ,Feature Map It's nuclear , In the depth model , It is neural network .
Regularization
Regular terms take many forms , For example, regular terms [ The formula ] ,[ The formula ] Contains the parameters of the model ,feature map Parameters of .
Inference Mode
If the anomaly detection model is related to Bayesian inference , There is this stage .

Learning goals
Based on the above five dimensions , We can summarize the universal learning objectives of anomaly detection models , as follows ：
Insert picture description here

7.2 Comparison and discussion

Methods to compare
（1） quite a lot probabilistic methods Targets that depend on negative log likelihood . Under this goal , The model can sort the samples .

（2）Reconstruction methods The samples under can also be sorted . It is especially suitable for manifold data and prototypical structure.

（3）one-class classification methods Samples cannot be sorted . obviously , It can provide relatively little information , But it is more efficient and robust . In this method , We also regard the distance from the point to the decision interface as the index of sorting , But it may not reflect the pattern of data .

Ideas transfer
Under the unified framework , We can also find , Some of the shallow and deep anomaly detection models ideas Can migrate to each other . The following examples illustrate ：

（1）kernel SVDD and Deep SVDD A hypersphere model is used .

（2）deep AEs Reconstruction errors are usually defined in the original data space , and kPCA The reconstruction error is defined in the kernel feature space . that , Whether to define the error pair model in the neural network feature space AEs It's also beneficial ？ Some recent work is AE The reconstruction error is defined in the hidden layer of .

（3） Whether in deep AE Embedded in prototype assumption It will be more useful ？ How to embed ？VQ-VAE The model introduces discrete between encoder and decoder codebook, So as to embed prototype assumption, It is a worthy example .

7.3 Distance-Based Anomaly Detection

We provide a unified framework for learning loss based anomaly detection methods . But in addition to the loss based anomaly detection model , Another kind of anomaly detection model is based on distance , stay data mining community There are also many existing jobs . Many of these methods adopt inert learning strategies , That is, the model will not be trained in advance , Only when new test samples appear , To evaluate . This kind of method includes nearest neighbor method , such as LOF, Partition tree , Like isolated forests . Such methods are rarely combined with deep learning .

8. Evaluation and Explanation（ Evaluate and explain ）

In practice , The first problem of anomaly detection is the benchmark data set benchmarks, The second question is how to evaluate the effect , And in recent years , The interpretability of the model has also attracted much attention . This section will explore these three issues in depth .

8.1 Building Anomaly Detection Benchmarks

Insert picture description here
As shown in the figure above , Existing benchmarks It can be roughly divided into three categories .

(1) k-classes-out: From existing binary or multiclass datasets , Set one or more classes as normal samples , The rest are set as abnormal samples . Because all abnormal samples belong to the same class , such benchmarks Can not well simulate the characteristics of real outliers .

(2) Synthetic: Synthesize outliers from supervised or unsupervised data sets . Statistically speaking , Artificial synthesis can control the error within a certain range , Can more comprehensively control abnormal points . But the real outliers may be unknown or difficult to synthesize .

(3) Real-world: This is the ideal benchmarks, Outliers are marked by human experts . Experts can also point out which characteristics of the sample are related to the abnormality .

It is worth noting that , Follow the above three categories benchmarks comparison , The outliers in the real data set are more diverse .

8.2 Evaluating Anomaly Detectors

Insert picture description here

8.3 Explaining Anomalies

It is quite common to interpret the prediction of the model in the supervised field , This research direction is called XAI (Explainable Artificial Intelligence), The more popular methods are LIME,Grad-CAM,integrated gradients,layerwise relevance propagation (LRP).

XAI It has been introduced into unsupervised learning and anomaly detection . The mainstream model of supervised scenarios is the deep learning model , It is easy to explain them in a unified framework . But there are many methods for unsupervised scenarios , There are nuclear based , Probability based , Based on neural network , Strong heterogeneity , Therefore, it is not easy to design a unified interpretation framework . In order to achieve consistent interpretation , Compare the following two directions promising:

(1) The model has nothing to do with (model-agnostic) Interpretation technology . The technology does not depend on the model itself , It depends on other factors , Such as sampling technology .

(2) neuralization, It refers to the transformation of non neural network model to equivalent neural network

9. Conclusion and Outlook（ Conclusion and Prospect ）

9.1 Unexplored Combinations of Modeling Dimensions

There has been a lot of work on anomaly detection from different dimensions . This review has reviewed the similarities between kernel methods and deep learning methods in anomaly detection at the conceptual level . The design ideas of the new methods under these two categories are very different . We think , The ideas of these two methods can be used for reference . Here are the directions we think we can explore ：

（1） Noise in the model 、 Sample contamination 、 Under the problem of robustness of signal-to-noise ratio , A lot has been developed shallow methods, However, there are few relevant depth methods .

（2） Bayesian inference belongs to shallow method, Recently, Bayesian approximation and Bayesian neural network use uncertainty estimation as a supplement to the abnormal score .

（3） In the field of semi supervised anomaly detection , The idea of nuclear methods has migrated to deep one-class classification. However, probability methods and reconstruction methods that can use labeled abnormal samples have received relatively little attention . in addition , For anomaly detection of time series , If you can use the marked samples , It can greatly improve the performance of the model . come from density ratio estimation, noise contrast estimation, or coding theory The concept of can also lead to innovation in this field .

（4） The active learning strategy of anomaly detection has introduced shallow detectors , It can be extended to deep learning methods .

9.2 Bridging Related Lines of Research on Robustness

9.3 Interpretability and Trustworthiness

Part of the anomaly detection research focuses on improving the detection accuracy of the model . But the interpretability and reliability of the model are equally important . Interpretable performance enhances the transparency of the model , It facilitates reliability decisions , Explain the failure of the model , Understand the defects of the model .

The interpretation of anomaly detection has been as follows ：（1） Find the subspace of abnormal discriminant features ;（2） deducing sequential feature explanations;（3） Using reconstruction errors at the feature level ;（4） Use full convolution architecture ;（5） By integral gradient or LRP Explain the anomalies .

Despite the above work , At present, there are still quite few papers on interpretability and reliability . The challenges are ：（1） Outliers are likely to be heterogeneous , The generation mechanism is different ;（2） The occurrence of outliers may be caused by abnormal patterns , It may also be caused by the lack of comparison with normal patterns . The interpretation of the former can focus on abnormal characteristics , The latter has no countermeasures .（3） When tasks and data become more and more complex , Explanation will become more and more difficult .

9.4 Need for Challenging and Open Data Sets

Clear model evaluation indicators and open data sets are very important to promote the development of the research field . For example, the success of computer vision undoubtedly benefits from its public ImageNet database And some competitions . Existing deep anomaly detection ,OOD testing , The work of open set classification problem is still to modify a lot of classification data , Treat some data categories as exceptions , The rest are considered normal ; Or consider in-distribution and out of distribution The combination of . Although these artificially synthesized data have certain value , But whether they can really reflect the nature of anomalies is debatable .

It is worth noting that , Only a few methods have absolute advantages over most data sets . That is, we may have a deviation when evaluating the new method . We often only focus on analyzing the advantages of new methods , Without analyzing its disadvantages and limitations . We think it is important to judge when or to what extent the model will go wrong , Deserve more attention . At this time, we need a variety of data to support the model evaluation .

Recent data sets and competitions , Such as MVTec-AD and Medical Out-of-Distribution Analysis Challenge, It is a good example of expanding research in this field . meanwhile , We need more data sets and competitions .

9.5 Weak Supervision and Self-Supervised Learning

A lot of anomaly detection focuses on unsupervised learning . Recent progress shows that , Weak supervision and self-monitoring learning can solve more complex detection tasks .

Weak supervision
Weak supervision learns from small or imperfect labeled data . The following three directions of weak supervision have not been fully explored , Worthy of attention ：

（1） Current SSAD The model shows that even a small number of marked abnormal samples can significantly improve the performance of the model on complex data . Its mode is to combine semi supervised learning with active learning , Better identify samples that are useful for prediction . say concretely , Let the model interact with experts , Throw unmarked data and let experts determine the label , So again and again , In order to make the model get better performance with less labeled data .

（2） Use a large amount of public data in some fields as auxiliary negative samples, To some extent, it also belongs to weak supervision . Although these negative samples It may not reflect the characteristics of real abnormal samples , But this has a reinforcing effect on learning the representation of normal samples .

（3） Apply migration learning to anomaly detection , Let more domain knowledge be distilled into the model .

In anomaly detection scenarios involving high-dimensional data rich in semantics , Weak supervision and introducing domain prior knowledge are both effective methods to solve problems .
Self supervision
Self supervision learns the representation of samples by learning auxiliary tasks . These auxiliary tasks do not need the label of the sample , Therefore, the self-monitoring task is suitable for a large number of unmarked sample scenarios , Very attractive for anomaly detection .

The self-monitoring method is introduced into the visual anomaly detection multi classification model , Images （ sample ） The pseudo label of comes from the geometric transformation of the image . The output of the last layer of the model goes through softmax Then a prediction distribution is generated , The closer the prediction distribution is to the uniform distribution, the higher the prediction uncertainty , It means the existence of outliers . These methods are in k-classes-out Has achieved remarkable results on the image data set .Bergman et.al Recently, it is proposed to generalize these methods to non image data , Affine transform them , Such methods are called GOAD.

Combine self-monitoring method with comparative learning , It is also a promising research direction .

Broadly speaking , The interesting research question about self-monitoring is ： To what extent does self-monitoring learn the semantic representation of anomaly detection tasks ？ This research problem is worth exploring .

9.6 Foundation and Theory

There are more in the recent progress of anomaly detection fundamental questions, Include (1) Generalization ability of various methods ,（2） Definition of outliers in high dimensional space ,（3） Theoretical explanation of the prediction results .

Nalisnick et.al Observed depth generation model , Such as standardized flow ,VAEs, Autoregressive model , Usually, outliers are given higher likelihood values . Many follow-up work has this phenomenon . But this phenomenon is counter intuitive , This shows that we seriously lack theoretical explanations for these models . There is sufficient evidence to show that , One of the reasons for this counter intuitive phenomenon is that the generative model is largely biased towards low-level Statistical background , That is, simple data points are easy to obtain higher likelihood values . Another explanation is , For data in high dimensional space , High density areas are not necessarily high probability quality , Points with high probability quality are more likely to appear . This phenomenon challenges the standard theoretical density modeling and likelihood based anomaly detection model , Therefore, it is worth exploring .

The model based on reconstruction can well reconstruct the simple in convex hull data OOD spot . such as , In a MNIST Training on the digital set AE It can well reconstruct an abnormal all black image . This effect is obviously not what anomaly detection wants . therefore , We should seek to understand this in theory OOD The characteristics of .

Deep learning is also to explain and analyze from different theoretical perspectives AD Questions are raised except for new possibilities . such as ,AE It can be understood from the perspective of information theory , According to the principle of information maximization , It implicitly maximizes the mutual information between input and implicit representation . Similarly , From the perspective of information theory VAEs, They aim to achieve a balance between compressing space and reconstructing accuracy . lately , This view will VAEs and Deep SVDD Connect .Deep SVDD It can be regarded as a special case , It only maximizes the compressed space . in general , There is not much work on anomaly detection from the perspective of information theory , We think this is a good research direction .

原网站

版权声明
本文为[Yizhi WOW]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/182/202207011505139082.html

当前位置：网站首页>A unifying review of deep and shallow anomaly detection

A unifying review of deep and shallow anomaly detection

A Unifying Review of Deep and Shallow Anomaly Detection

1. background

（1） meeting / Publication level

（2） The author team

2. Introduction

2.1 What is an anomaly ？（ Definition ）

2.2 Common anomaly detection algorithms

2.3 Study history

2.4 Anomaly detection applications

2.5 Deep learning method in anomaly detection

3. An introduction to anomaly detection

3.1 The importance of anomaly detection

3.2 The challenge of anomaly detection

3.3 Formal Definition of Anomaly Detection

（1）what is an anomaly?

（2）types of anomalies

（3）Anomaly（ Outliers ）, Outlier（ outliers ）, or Novelty（ Novelty ）?

（4）Concentration Assumption（ Concentration assumption ）

（5）Density Level Set Estimation（ Density level estimation ）

（6）Density Estimation for Level Set Estimation

（7）Threshold Versus Score

（8）Selecting a Level α

3.4 Dataset settings and dataset attributes

（1）Distribution of Anomalies

（2）Unsupervised Setting

（3）Semisupervised Setting

（4）Supervised Setting

（5）Further Data Properties

3.5 Challenges in Anomaly Detection

4. Density Estimation and Probabilistic Models（ Density estimation and probability model ）

4.1 Classic Density Estimation

4.2 Energy-Based Models

4.3 Neural Generative Models (VAEs and GANs)

4.4 Normalizing Flows（ Normalized flow ）

5. One-Class Classification（ A classification ）

5.1 The goal of a classification

5.2 One-Class Classification in Input Space

5.3 Kernel-Based One-Class Classification

5.4 Deep One-Class Classification

5.5 Negative Example

（1）Artificial

（2）Auxiliary

（3）True

6. Reconstruction Models

6.1 Refactoring goals

6.2 Principal Component Analysis

6.3 Autoencoders（ Self encoder ）

6.4 Prototypical Clustering（ Prototype clustering ）

7. Unifying View of Anomaly Detection

7.1 Modeling Dimensions of the AD Problem

7.2 Comparison and discussion

7.3 Distance-Based Anomaly Detection

8. Evaluation and Explanation（ Evaluate and explain ）

8.1 Building Anomaly Detection Benchmarks

8.2 Evaluating Anomaly Detectors

8.3 Explaining Anomalies

9. Conclusion and Outlook（ Conclusion and Prospect ）

9.1 Unexplored Combinations of Modeling Dimensions

9.2 Bridging Related Lines of Research on Robustness

9.3 Interpretability and Trustworthiness

9.4 Need for Challenging and Open Data Sets

9.5 Weak Supervision and Self-Supervised Learning

9.6 Foundation and Theory

边栏推荐

猜你喜欢

随机推荐