当前位置:网站首页>Statistical method for anomaly detection
Statistical method for anomaly detection
2022-07-07 23:06:00 【Anny Linlin】
1、 The general idea is : Learn a generation model that fits a given data set , Then identify the objects in the low probability region of the model , Take them as outliers .
2、 Statistical methods for anomaly detection can be divided into two main types : Parametric and nonparametric methods .
3、 Parameter method
3.1 Univariate outlier detection based on normal distribution
Data involving only one attribute or variable is called metadata . We assume that the data is generated by a normal distribution , Then the parameters of normal distribution can be learned from the input data , And identify the points with low probability as abnormal points .
3.2 Multivariate outlier detection
Data involving two or more attributes or variables is called multivariate data . Many unary outlier detection methods can be extended , Used to process multivariate data . The core idea is to transform the multi outlier detection task into a single outlier detection problem . For example, when univariate outlier detection based on normal distribution is extended to multivariate cases , You can find the mean and standard deviation of each dimension .
4、 Nonparametric methods
In the nonparametric method of anomaly detection ,“ Normal data ” Learning from input data , Instead of assuming a priori . Usually , Nonparametric methods make less assumptions about data , So it can be used in more cases .
Example : Use histogram to detect outliers .
Histogram is a frequently used nonparametric statistical model , It can be used to detect outliers . This process includes the following two steps :
step 1: Construct histogram . Use input data ( Training data ) Construct a histogram . The histogram can be unary , Or diversified ( If the input data is multidimensional ).
Although nonparametric methods do not assume any prior statistical model , However, it is often true that the user is required to provide parameters , In order to learn from data . for example , The user must specify the type of histogram ( Equal in width or depth ) And other parameters ( The number of boxes in the histogram or the size of each box ). Different from the parametric method , These parameters do not specify the type of data distribution .
step 2: Detect outliers . To determine whether an object is an outlier , You can check it against the histogram . In the simplest way , If the object falls into a box in the histogram , Then the object is considered normal , Otherwise, it is considered as an outlier .
For more complex methods , Histogram can be used to give each object an outlier score . For example, let the abnormal point score of the object be the reciprocal of the volume of the box that the object falls into .
One disadvantage of using histogram as a nonparametric model for outlier detection is , It's hard to choose the right box size . One side , If the box size is too small , Then many normal objects will fall into empty or sparse boxes , Therefore, it is mistakenly recognized as an outlier . On the other hand , If the box size is too large , Then the abnormal point object may penetrate into some frequent boxes , thus “ Pretending to be ” Become normal .
5、HBOS
HBOS Full name :Histogram-based Outlier Score. It's a combination of univariate methods , You can't model dependencies between features , But it's faster , Friendly to big data sets . The basic assumption is that each dimension of the dataset is independent of each other . Then interval each dimension (bin) Divide , The higher the density of the interval , The lower the abnormal score .
6、 practice
边栏推荐
- Force deduction - question 561 - array splitting I - step by step parsing
- Transparent i/o model from beginning to end
- The author of LinkedList said he didn't use LinkedList himself
- Common verification rules of form components -2 (continuously updating ~)
- Apple further entered the financial sector through the 'virtual card' security function in IOS 16
- Redis cluster installation
- Take full control! Create a "leading cockpit" for smart city construction
- Micro service remote debug, nocalhost + rainbow micro service development second bullet
- ASP. Net core introduction V
- PCL .vtk文件与.pcd的相互转换
猜你喜欢
行测-图形推理-4-字母类
It's no exaggeration to say that this is the most user-friendly basic tutorial of pytest I've ever seen
LeetCode707. Design linked list
小程序多种开发方式对比-跨端?低代码?原生?还是云开发?
Ni9185 and ni9234 hardware settings in Ni Max
CTF练习
Transform XL translation
微服務遠程Debug,Nocalhost + Rainbond微服務開發第二彈
Force deduction - question 561 - array splitting I - step by step parsing
详解全志V853上的ARM A7和RISC-V E907之间的通信方式
随机推荐
Microbial Health Network, How to restore Microbial Communities
Personal statement of testers from Shuangfei large factory: is education important for testers?
Debezium系列之:支持 mysql8 的 set role 語句
Are the microorganisms in the intestines the same as those on the skin?
ADC采样率(HZ)是什么怎么计算
全面掌控!打造智慧城市建设的“领导驾驶舱”
Cascade-LSTM: A Tree-Structured Neural Classifier for Detecting Misinformation Cascades-KDD2020
数据库每日一题---第22天:最后一次登录
One question per day - pat grade B 1002 questions
Yarn cannot view the historical task log of yarn after enabling ACL user authentication. Solution
行测-图形推理-7-相异图形类
[record of question brushing] 3 Longest substring without duplicate characters
Digital transformation: five steps to promote enterprise progress
不夸张地说,这是我见过最通俗易懂的,pytest入门基础教程
Debezium系列之: 支持在 KILL 命令中使用变量
软件测评中心▏自动化测试有哪些基本流程和注意事项?
Transparent i/o model from beginning to end
LeetCode206. Reverse linked list [double pointer and recursion]
安踏DTC | 安踏转型,构建不只有FILA的增长飞轮
关于海康ipc的几个参数