当前位置:网站首页>Statistical method for anomaly detection
Statistical method for anomaly detection
2022-07-07 23:06:00 【Anny Linlin】
1、 The general idea is : Learn a generation model that fits a given data set , Then identify the objects in the low probability region of the model , Take them as outliers .
2、 Statistical methods for anomaly detection can be divided into two main types : Parametric and nonparametric methods .
3、 Parameter method
3.1 Univariate outlier detection based on normal distribution
Data involving only one attribute or variable is called metadata . We assume that the data is generated by a normal distribution , Then the parameters of normal distribution can be learned from the input data , And identify the points with low probability as abnormal points .
3.2 Multivariate outlier detection
Data involving two or more attributes or variables is called multivariate data . Many unary outlier detection methods can be extended , Used to process multivariate data . The core idea is to transform the multi outlier detection task into a single outlier detection problem . For example, when univariate outlier detection based on normal distribution is extended to multivariate cases , You can find the mean and standard deviation of each dimension .
4、 Nonparametric methods
In the nonparametric method of anomaly detection ,“ Normal data ” Learning from input data , Instead of assuming a priori . Usually , Nonparametric methods make less assumptions about data , So it can be used in more cases .
Example : Use histogram to detect outliers .
Histogram is a frequently used nonparametric statistical model , It can be used to detect outliers . This process includes the following two steps :
step 1: Construct histogram . Use input data ( Training data ) Construct a histogram . The histogram can be unary , Or diversified ( If the input data is multidimensional ).
Although nonparametric methods do not assume any prior statistical model , However, it is often true that the user is required to provide parameters , In order to learn from data . for example , The user must specify the type of histogram ( Equal in width or depth ) And other parameters ( The number of boxes in the histogram or the size of each box ). Different from the parametric method , These parameters do not specify the type of data distribution .
step 2: Detect outliers . To determine whether an object is an outlier , You can check it against the histogram . In the simplest way , If the object falls into a box in the histogram , Then the object is considered normal , Otherwise, it is considered as an outlier .
For more complex methods , Histogram can be used to give each object an outlier score . For example, let the abnormal point score of the object be the reciprocal of the volume of the box that the object falls into .
One disadvantage of using histogram as a nonparametric model for outlier detection is , It's hard to choose the right box size . One side , If the box size is too small , Then many normal objects will fall into empty or sparse boxes , Therefore, it is mistakenly recognized as an outlier . On the other hand , If the box size is too large , Then the abnormal point object may penetrate into some frequent boxes , thus “ Pretending to be ” Become normal .
5、HBOS
HBOS Full name :Histogram-based Outlier Score. It's a combination of univariate methods , You can't model dependencies between features , But it's faster , Friendly to big data sets . The basic assumption is that each dimension of the dataset is independent of each other . Then interval each dimension (bin) Divide , The higher the density of the interval , The lower the abnormal score .
6、 practice
边栏推荐
- Sword finger offer 55 - I. depth of binary tree
- Sword finger offer 28 Symmetric binary tree
- [network] Introduction to C language
- 行測-圖形推理-4-字母類
- Common verification rules of form components -2 (continuously updating ~)
- Debezium系列之:引入对 LATERAL 运算符的支持
- The author of LinkedList said he didn't use LinkedList himself
- Line measurement - graphic reasoning -9- line problem class
- Nx10.0 installation tutorial
- ASP.NET Core入门五
猜你喜欢
Signal feature extraction +lstm to realize gear reducer fault diagnosis -matlab code
LeetCode206. Reverse linked list [double pointer and recursion]
Line test - graphic reasoning - 4 - alphabetic class
【刷题记录】3. 无重复字符的最长子串
Two minutes, talk about some wrong understandings of MySQL index
Common verification rules of form components -2 (continuously updating ~)
Unity and webgl love each other
Comparison of various development methods of applets - cross end? Low code? Native? Or cloud development?
开发那些事儿:Go加C.free释放内存,编译报错是什么原因?
行测-图形推理-8-图群类
随机推荐
行测-图形推理-7-相异图形类
2021-01-12
Gbu1510-asemi power supply special 15A rectifier bridge gbu1510
Line test - graphic reasoning - 3 - symmetric graphic class
Line test - graphic reasoning - 4 - alphabetic class
Leetcode19. Delete the penultimate node of the linked list [double pointer]
What is ADC sampling rate (Hz) and how to calculate it
Time convolution Network + soft threshold + attention mechanism to realize residual life prediction of mechanical equipment
LeetCode707. Design linked list
Leetcode206. Reverse linked list
Nx10.0 installation tutorial
Force deduction - question 561 - array splitting I - step by step parsing
Why is network i/o blocked?
Talk about DART's null safety feature
LeetCode206. Reverse linked list [double pointer and recursion]
LeetCode203. Remove linked list elements
How to operate DTC community?
行测-图形推理-4-字母类
ADC采样率(HZ)是什么怎么计算
筑起云端 “免疫”屏障,让你的数据有备无患