当前位置:网站首页>Statistical method for anomaly detection
Statistical method for anomaly detection
2022-07-07 23:06:00 【Anny Linlin】
1、 The general idea is : Learn a generation model that fits a given data set , Then identify the objects in the low probability region of the model , Take them as outliers .
2、 Statistical methods for anomaly detection can be divided into two main types : Parametric and nonparametric methods .
3、 Parameter method
3.1 Univariate outlier detection based on normal distribution
Data involving only one attribute or variable is called metadata . We assume that the data is generated by a normal distribution , Then the parameters of normal distribution can be learned from the input data , And identify the points with low probability as abnormal points .
3.2 Multivariate outlier detection
Data involving two or more attributes or variables is called multivariate data . Many unary outlier detection methods can be extended , Used to process multivariate data . The core idea is to transform the multi outlier detection task into a single outlier detection problem . For example, when univariate outlier detection based on normal distribution is extended to multivariate cases , You can find the mean and standard deviation of each dimension .
4、 Nonparametric methods
In the nonparametric method of anomaly detection ,“ Normal data ” Learning from input data , Instead of assuming a priori . Usually , Nonparametric methods make less assumptions about data , So it can be used in more cases .
Example : Use histogram to detect outliers .
Histogram is a frequently used nonparametric statistical model , It can be used to detect outliers . This process includes the following two steps :
step 1: Construct histogram . Use input data ( Training data ) Construct a histogram . The histogram can be unary , Or diversified ( If the input data is multidimensional ).
Although nonparametric methods do not assume any prior statistical model , However, it is often true that the user is required to provide parameters , In order to learn from data . for example , The user must specify the type of histogram ( Equal in width or depth ) And other parameters ( The number of boxes in the histogram or the size of each box ). Different from the parametric method , These parameters do not specify the type of data distribution .
step 2: Detect outliers . To determine whether an object is an outlier , You can check it against the histogram . In the simplest way , If the object falls into a box in the histogram , Then the object is considered normal , Otherwise, it is considered as an outlier .
For more complex methods , Histogram can be used to give each object an outlier score . For example, let the abnormal point score of the object be the reciprocal of the volume of the box that the object falls into .
One disadvantage of using histogram as a nonparametric model for outlier detection is , It's hard to choose the right box size . One side , If the box size is too small , Then many normal objects will fall into empty or sparse boxes , Therefore, it is mistakenly recognized as an outlier . On the other hand , If the box size is too large , Then the abnormal point object may penetrate into some frequent boxes , thus “ Pretending to be ” Become normal .
5、HBOS
HBOS Full name :Histogram-based Outlier Score. It's a combination of univariate methods , You can't model dependencies between features , But it's faster , Friendly to big data sets . The basic assumption is that each dimension of the dataset is independent of each other . Then interval each dimension (bin) Divide , The higher the density of the interval , The lower the abnormal score .
6、 practice
边栏推荐
- LeetCode144. Preorder traversal of binary tree
- Leetcode1984. Minimum difference in student scores
- Introduction to anomaly detection
- De la famille debezium: SET ROLE statements supportant mysql8
- Line measurement - graphic reasoning -9- line problem class
- Unity dynamically merges mesh textures
- 行测-图形推理-1-汉字类
- 【测试面试题】页面很卡的原因分析及解决方案
- Leetcode94. Middle order traversal of binary trees
- Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
猜你喜欢
微生物健康网,如何恢复微生物群落
Basic knowledge of binary tree
Early childhood education industry of "screwing bar": trillion market, difficult to be a giant
Transform XL translation
Anta DTC | Anta transformation, building a growth flywheel that is not only FILA
Database daily question --- day 22: last login
LeetCode206. Reverse linked list [double pointer and recursion]
Time convolution Network + soft threshold + attention mechanism to realize residual life prediction of mechanical equipment
【测试面试题】页面很卡的原因分析及解决方案
PCL .vtk文件与.pcd的相互转换
随机推荐
Circumvention Technology: Registry
Unity与WebGL的相爱相杀
数字藏品加速出圈,MarsNFT助力多元化文旅经济!
Debezium系列之:源码阅读之BinlogReader
Debezium系列之:支持 mysql8 的 set role 语句
Sword finger offer 55 - I. depth of binary tree
Microservice Remote debug, nocalhost + rainbond microservice Development second Bomb
LeetCode206. Reverse linked list [double pointer and recursion]
LeetCode144. Preorder traversal of binary tree
Leetcode94. Middle order traversal of binary trees
Digital collections accelerated out of the circle, and marsnft helped diversify the culture and tourism economy!
Ni9185 and ni9234 hardware settings in Ni Max
微生物健康網,如何恢複微生物群落
行测-图形推理-5-一笔画类
消息队列与快递柜之间妙不可言的关系
CTF exercise
Early childhood education industry of "screwing bar": trillion market, difficult to be a giant
行测-图形推理-1-汉字类
Debezium series: set role statement supporting mysql8
Apple further entered the financial sector through the 'virtual card' security function in IOS 16