当前位置:网站首页>Statistical method for anomaly detection
Statistical method for anomaly detection
2022-07-07 23:06:00 【Anny Linlin】
1、 The general idea is : Learn a generation model that fits a given data set , Then identify the objects in the low probability region of the model , Take them as outliers .
2、 Statistical methods for anomaly detection can be divided into two main types : Parametric and nonparametric methods .
3、 Parameter method
3.1 Univariate outlier detection based on normal distribution
Data involving only one attribute or variable is called metadata . We assume that the data is generated by a normal distribution , Then the parameters of normal distribution can be learned from the input data , And identify the points with low probability as abnormal points .
3.2 Multivariate outlier detection
Data involving two or more attributes or variables is called multivariate data . Many unary outlier detection methods can be extended , Used to process multivariate data . The core idea is to transform the multi outlier detection task into a single outlier detection problem . For example, when univariate outlier detection based on normal distribution is extended to multivariate cases , You can find the mean and standard deviation of each dimension .
4、 Nonparametric methods
In the nonparametric method of anomaly detection ,“ Normal data ” Learning from input data , Instead of assuming a priori . Usually , Nonparametric methods make less assumptions about data , So it can be used in more cases .
Example : Use histogram to detect outliers .
Histogram is a frequently used nonparametric statistical model , It can be used to detect outliers . This process includes the following two steps :
step 1: Construct histogram . Use input data ( Training data ) Construct a histogram . The histogram can be unary , Or diversified ( If the input data is multidimensional ).
Although nonparametric methods do not assume any prior statistical model , However, it is often true that the user is required to provide parameters , In order to learn from data . for example , The user must specify the type of histogram ( Equal in width or depth ) And other parameters ( The number of boxes in the histogram or the size of each box ). Different from the parametric method , These parameters do not specify the type of data distribution .
step 2: Detect outliers . To determine whether an object is an outlier , You can check it against the histogram . In the simplest way , If the object falls into a box in the histogram , Then the object is considered normal , Otherwise, it is considered as an outlier .
For more complex methods , Histogram can be used to give each object an outlier score . For example, let the abnormal point score of the object be the reciprocal of the volume of the box that the object falls into .
One disadvantage of using histogram as a nonparametric model for outlier detection is , It's hard to choose the right box size . One side , If the box size is too small , Then many normal objects will fall into empty or sparse boxes , Therefore, it is mistakenly recognized as an outlier . On the other hand , If the box size is too large , Then the abnormal point object may penetrate into some frequent boxes , thus “ Pretending to be ” Become normal .
5、HBOS
HBOS Full name :Histogram-based Outlier Score. It's a combination of univariate methods , You can't model dependencies between features , But it's faster , Friendly to big data sets . The basic assumption is that each dimension of the dataset is independent of each other . Then interval each dimension (bin) Divide , The higher the density of the interval , The lower the abnormal score .
6、 practice
边栏推荐
- ASEMI整流桥KBPC1510的型号数字代表什么
- 数据库每日一题---第22天:最后一次登录
- GBU1510-ASEMI电源专用15A整流桥GBU1510
- 全面掌控!打造智慧城市建设的“领导驾驶舱”
- 2021-01-11
- 7-18 simple simulation of banking business queue
- Basic knowledge of binary tree
- Transparent i/o model from beginning to end
- Anta DTC | Anta transformation, building a growth flywheel that is not only FILA
- 软件测评中心▏自动化测试有哪些基本流程和注意事项?
猜你喜欢
Cases of agile innovation and transformation of consumer goods enterprises
Force deduction - question 561 - array splitting I - step by step parsing
Two minutes, talk about some wrong understandings of MySQL index
Microbial Health Network, How to restore Microbial Communities
今日创见|企业促进创新的5大关键要素
LeetCode144. Preorder traversal of binary tree
PCL . VTK files and Mutual conversion of PCD
消息队列与快递柜之间妙不可言的关系
Innovation today | five key elements for enterprises to promote innovation
Signal feature extraction +lstm to realize gear reducer fault diagnosis -matlab code
随机推荐
小程序多种开发方式对比-跨端?低代码?原生?还是云开发?
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
Microbial Health Network, How to restore Microbial Communities
Nx10.0 installation tutorial
Use JfreeChart to generate curves, histograms, pie charts, and distribution charts and display them to jsp-2
Cases of agile innovation and transformation of consumer goods enterprises
Apple further entered the financial sector through the 'virtual card' security function in IOS 16
Qt Graphicsview图形视图使用总结附流程图开发案例雏形
Signal feature extraction +lstm to realize gear reducer fault diagnosis -matlab code
Debezium series: source code reading snapshot reader
There is another problem just online... Warm
Yarn开启ACL用户认证之后无法查看Yarn历史任务日志解决办法
Line test - graphic reasoning - 6 - similar graphic classes
Amesim2016 and matlab2017b joint simulation environment construction
CTF练习
Transform XL translation
Line test - graphic reasoning -7- different graphic classes
详解全志V853上的ARM A7和RISC-V E907之间的通信方式
Debezium系列之:源码阅读之SnapshotReader
0-5vac to 4-20mA AC current isolated transmitter / conversion module