当前位置:网站首页>Statistical method for anomaly detection
Statistical method for anomaly detection
2022-07-07 23:06:00 【Anny Linlin】
1、 The general idea is : Learn a generation model that fits a given data set , Then identify the objects in the low probability region of the model , Take them as outliers .
2、 Statistical methods for anomaly detection can be divided into two main types : Parametric and nonparametric methods .
3、 Parameter method
3.1 Univariate outlier detection based on normal distribution
Data involving only one attribute or variable is called metadata . We assume that the data is generated by a normal distribution , Then the parameters of normal distribution can be learned from the input data , And identify the points with low probability as abnormal points .
3.2 Multivariate outlier detection
Data involving two or more attributes or variables is called multivariate data . Many unary outlier detection methods can be extended , Used to process multivariate data . The core idea is to transform the multi outlier detection task into a single outlier detection problem . For example, when univariate outlier detection based on normal distribution is extended to multivariate cases , You can find the mean and standard deviation of each dimension .
4、 Nonparametric methods
In the nonparametric method of anomaly detection ,“ Normal data ” Learning from input data , Instead of assuming a priori . Usually , Nonparametric methods make less assumptions about data , So it can be used in more cases .
Example : Use histogram to detect outliers .
Histogram is a frequently used nonparametric statistical model , It can be used to detect outliers . This process includes the following two steps :
step 1: Construct histogram . Use input data ( Training data ) Construct a histogram . The histogram can be unary , Or diversified ( If the input data is multidimensional ).
Although nonparametric methods do not assume any prior statistical model , However, it is often true that the user is required to provide parameters , In order to learn from data . for example , The user must specify the type of histogram ( Equal in width or depth ) And other parameters ( The number of boxes in the histogram or the size of each box ). Different from the parametric method , These parameters do not specify the type of data distribution .
step 2: Detect outliers . To determine whether an object is an outlier , You can check it against the histogram . In the simplest way , If the object falls into a box in the histogram , Then the object is considered normal , Otherwise, it is considered as an outlier .
For more complex methods , Histogram can be used to give each object an outlier score . For example, let the abnormal point score of the object be the reciprocal of the volume of the box that the object falls into .
One disadvantage of using histogram as a nonparametric model for outlier detection is , It's hard to choose the right box size . One side , If the box size is too small , Then many normal objects will fall into empty or sparse boxes , Therefore, it is mistakenly recognized as an outlier . On the other hand , If the box size is too large , Then the abnormal point object may penetrate into some frequent boxes , thus “ Pretending to be ” Become normal .
5、HBOS
HBOS Full name :Histogram-based Outlier Score. It's a combination of univariate methods , You can't model dependencies between features , But it's faster , Friendly to big data sets . The basic assumption is that each dimension of the dataset is independent of each other . Then interval each dimension (bin) Divide , The higher the density of the interval , The lower the abnormal score .
6、 practice
边栏推荐
- Signal feature extraction +lstm to realize gear reducer fault diagnosis -matlab code
- There is another problem just online... Warm
- Debezium系列之:源码阅读之SnapshotReader
- Debezium series: MySQL tombstone event
- ASP. Net core introduction V
- 微生物健康網,如何恢複微生物群落
- 消费品企业敏捷创新转型案例
- 双非大厂测试员亲述:对测试员来说,学历重要吗?
- Ligne - raisonnement graphique - 4 - classe de lettres
- 三菱PLC slmp(mc)协议
猜你喜欢
0-5vac to 4-20mA AC current isolated transmitter / conversion module
PCL . VTK files and Mutual conversion of PCD
行测-图形推理-7-相异图形类
Redis cluster installation
I wish you all the best and the year of the tiger
Leetcode1984. Minimum difference in student scores
It's no exaggeration to say that this is the most user-friendly basic tutorial of pytest I've ever seen
Leetcode94. Middle order traversal of binary trees
What is fake sharing after filling the previous hole?
Line test graph reasoning graph group class
随机推荐
Line test - graphic reasoning - 1 - Chinese character class
Qt Graphicsview图形视图使用总结附流程图开发案例雏形
Time convolution Network + soft threshold + attention mechanism to realize residual life prediction of mechanical equipment
Transform XL translation
Cause analysis and solution of too laggy page of [test interview questions]
行测-图形推理-8-图群类
Circumvention Technology: Registry
知识点滴 - PCB制造工艺流程
Leetcode1984. Minimum difference in student scores
Unity与WebGL的相爱相杀
Unity technical notes (I) inspector extension
Debezium series: source code reading snapshot reader
Years of summary, some core suggestions for learning programming
行测-图形推理-9-线条问题类
Online interview, how to better express yourself? In this way, the passing rate will be increased by 50%~
This time, let's clear up: synchronous, asynchronous, blocking, non blocking
行测-图形推理-3-对称图形类
Txt file virus
CTF练习
2021-01-12