当前位置:网站首页>[Bayesian classification 3] semi naive Bayesian classifier
[Bayesian classification 3] semi naive Bayesian classifier
2022-06-26 20:37:00 【NoBug ㅤ】
List of articles
1. Review of naive Bayesian classifier knowledge
1.1 Category , features
We according to the Bayesian decision theory , Or rather, Bayesian classification principle , The first thing you get is a Expected loss 【 R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x) R(ci∣x)=∑j=1NλijP(cj∣x)】. Bayesian criteria Is to minimize the overall risk , Thus, it can be pushed to the requirement that some risks be minimized 【 h ∗ ( x ) = a r g m i n c ∈ Y R ( c ∣ x ) h^*(x)=arg\ min_{c\in Y}R(c|x) h∗(x)=arg minc∈YR(c∣x)】.
You can put c c c regard as “ Category ”, hold x x x regard as “ features ”. R ( c ∣ x ) R(c|x) R(c∣x) Is a category , It has its own characteristics , Such as 【 Good melon : Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】, c = { good melon , bad melon } c=\{ Good melon , Bad melon \} c={ good melon , bad melon }, x = { green green , clear Clear , . . . , The secret degree } x=\{ dark green , Clear ,..., density \} x={ green green , clear Clear ,..., The secret degree }, We calculate categories according to characteristics ( Right or not ) The risk of , The less risk , The more correct the representative category is .
1.2 risk , probability
according to Miscalculation of loss , We further derive R ( c ∣ x ) = 1 − P ( c ∣ x ) R(c_|x)=1-P(c|x) R(c∣x)=1−P(c∣x), P ( c ∣ x ) = P ( class other ∣ , sign ) P(c|x)=P( Category | features ) P(c∣x)=P( class other ∣ , sign ) It can be understood as : This category , Through these characteristics , The emergence of ( Or the probability of an event ) probability . for example :【 Good melon : Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】, We said :【 Colour and lustre = dark green , texture = Clear ,…, density ≥ \geq ≥ 0.697】 Is a good melon ( The probability is very high ). So this is according to the characteristics , Constantly adjust the probability of categories , yes Posterior probability .
We naturally hope that , According to these characteristics, the greater the probability of being a good melon, the better , So there is :【 h ∗ ( x ) = a r g m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h∗(x)=arg maxc∈YP(c∣x), There are several categories of datasets , One sample has several P ( c ∣ x ) P(c|x) P(c∣x), According to these P ( c ∣ x ) P(c|x) P(c∣x) Size , Select the one with the largest value , So this sample is Be divided For this category .】. We demand P ( c ∣ x ) P(c|x) P(c∣x) This probability , There's a formula :【 P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(c∣x)=P(x)P(c)P(x∣c)】, It's for all attributes joint probability ( In the actual , Will meet : Combination explosion , Missing value, etc ), difficulty In seeking Class conditional probability P ( x ∣ c ) P(x|c) P(x∣c). We use naive Bayes to solve Posterior probability ( P ( c ∣ x ) P(c|x) P(c∣x)) A difficult problem to calculate , Put forward 【 Assumption of independence of attribute condition 】( Attributes can also be called features ).
1.3 Class conditional probability
Naive Bayes classifier The expression of :【 h n b ( x ) = a r g m a x c ∈ Y P ( c ) ∏ i = 1 d P ( x i ∣ c ) h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c) hnb(x)=arg maxc∈Y P(c)∏i=1dP(xi∣c)】, The focus is on Class conditional probability On , Basically, we use... In the calculation of data sets “ Reduced sample space method ”, Instead of using some formulas of conditional probability in probability theory . Simply speaking , Statistical samples Number , Do it again The proportion operation .
2. Semi naive Bayesian classifier learning notes
2.1 introduction
Naive Bayesian classifier uses “ Attribute independence assumption ”, It is difficult to establish in real tasks . therefore , People try to relax the assumption of attribute conditional independence to a certain extent , This leads to a class called " Semi naive Bayesian classifier " (semi-naïve Bayes classifiers) Learning methods of . The basic idea :【 Due consideration should be given to the interdependent information among some attributes 】.
2.2 Knowledge cards
1. Semi naive Bayesian classifier :semi-naive Bayes classifiers
2. Also called :SNB Algorithm
2.3 Semi naive Bayesian classifier
- Expression of semi naive Bayesian classifier
p a i pa_i pai Attribute x j x_j xj Dependent properties , be called x j x_j xj Parent attribute
P ( c ∣ x ) ∝ P ( c ) ∏ j = 1 d P ( x j ∣ c , p a i ) P(c|x)\propto P(c)\prod_{j=1}^dP(x_j|c,pa_{i}) P(c∣x)∝P(c)j=1∏dP(xj∣c,pai)
- Rely solely on classifiers
The key to the problem is how to determine the value of each attribute Parent attribute , Different practices produce different Rely solely on classifiers .
2.4 Independent estimation
2.4.1 brief introduction
- " english :"
One-Dependent Estimator, abbreviation :ODE
- " purpose :"
It is a semi naive Bayesian classifier , Due consideration should be given to the interdependent information among some attributes , One of the most common strategies .
- "ODE principle :"
Suppose that each attribute depends on at most one other attribute outside the category
- " Due consideration should be given to the interdependent information among some attributes :"
Consider different strategies , The independent dependent classifiers formed are also different . representative :
1)SPODE(Super-Parent ODE)
2)TAN(Tree Augmented naive Bayes)
3)AODE(Average ODE)
2.4.2 SPODE( Superparent dependent estimation )
- principle
All features used depend on a single feature , This dependent feature is called Super parent (super-parent). And then through Cross validation Identify superparent ( Assume that each attribute is a superparent , Choose the one that works best ).
- chart 1

- Example
【 Data sets 】
| x 1 x_1 x1 | x 2 x_2 x2 | x 3 x_3 x3 | y y y |
|---|---|---|---|
| 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 1 |
| 1 | 1 | 1 | 1 |
| 1 | 0 | 0 | 0 |
| 1 | 1 | 1 | 0 |
| 0 | 0 | 0 | 0 |
| 0 | 1 | 1 | 0 |
| 0 | 1 | 0 | 1 |
| 0 | 1 | 1 | 0 |
| 0 | 0 | 0 | 0 |
【 forecast 】
| x 1 x_1 x1 | x 2 x_2 x2 | x 3 x_3 x3 | y y y |
|---|---|---|---|
| 1 | 1 | 0 | ? |
【SPODE The formula 】
P ( c ∣ x ) ∝ a r g m a x c ∈ Y P ( c ) ∏ j = 1 d P ( x j ∣ c , p a i ) P(c|x)\propto arg\ max_{c\in Y}\ P(c)\prod_{j=1}^dP(x_j|c,pa_{i}) P(c∣x)∝arg maxc∈Y P(c)j=1∏dP(xj∣c,pai)
【 Prior probability P ( c ) P(c) P(c)】
P ( y = 1 ) = 4 10 = 0.4 P(y=1)=\frac{4}{10}=0.4 P(y=1)=104=0.4
P ( y = 0 ) = 6 10 = 0.6 P(y=0)=\frac{6}{10}=0.6 P(y=0)=106=0.6
【 Class conditional probability P ( x i ∣ c , p a i ) P(x_i|c,pa_{i}) P(xi∣c,pai)[ Suppose first x 1 x_1 x1 For superparent p a i = x 1 pa_i=x_1 pai=x1]】
P ( x 1 = 1 ∣ y = 1 ) = 3 4 P(x_1=1|y=1)=\frac{3}{4} P(x1=1∣y=1)=43
P ( x 2 = 1 ∣ y = 1 , x 1 = 1 ) = 2 3 P(x_2=1|y=1,x_1=1)=\frac{2}{3} P(x2=1∣y=1,x1=1)=32
P ( x 3 = 0 ∣ y = 1 , x 1 = 1 ) = 1 3 P(x_3=0|y=1,x_1=1)=\frac{1}{3} P(x3=0∣y=1,x1=1)=31
P ( x 1 = 1 ∣ y = 0 ) = 2 6 = 1 3 P(x_1=1|y=0)=\frac{2}{6}=\frac{1}{3} P(x1=1∣y=0)=62=31
P ( x 2 = 1 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_2=1|y=0,x_1=1)=\frac{1}{2} P(x2=1∣y=0,x1=1)=21
P ( x 3 = 0 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_3=0|y=0,x_1=1)=\frac{1}{2} P(x3=0∣y=0,x1=1)=21
【 Semi naive Bayesian classifier 】
P ( y = 1 ) = 0.4 ∗ 0.75 ∗ 2 3 ∗ 1 3 = 0.067 P(y=1)=0.4*0.75*\frac{2}{3}*\frac{1}{3}=0.067 P(y=1)=0.4∗0.75∗32∗31=0.067
P ( y = 1 ) = 0.6 ∗ 1 3 ∗ 1 2 ∗ 1 2 = 0.050 P(y=1)=0.6*\frac{1}{3}*\frac{1}{2}*\frac{1}{2}=0.050 P(y=1)=0.6∗31∗21∗21=0.050
the With , pre measuring sample Ben class other , y = 1 therefore , Forecast sample category ,y=1 the With , pre measuring sample Ben class other ,y=1
2.4.3 AODE( Average independent dependence estimation )
- principle
SPODE It's choice A model To make predictions ,AODE It's based on Integrated learning The mechanism Rely solely on classifier . stay AODE in , Each model makes a prediction , And then The results averaged The final prediction result is obtained .(AODE Try to Each attribute Build as a superparent SPODE, And then I'm going to talk about those who have Enough training data Supported by SPODE Integrated as the end result .)
- Understand the formula
P ( c ∣ x ) ∝ ∑ i = 1 d P ( c , x i ) ∏ j = 1 d P ( x j ∣ c , x i ) d P(c|x)\propto \frac{\sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_{i})}{d} P(c∣x)∝d∑i=1dP(c,xi)∏j=1dP(xj∣c,xi)
- threshold m ′ m' m′
For example, in the example above , P ( x 2 = 1 ∣ y = 0 , x 1 = 1 ) = 1 2 P(x_2=1|y=0,x_1=1)=\frac{1}{2} P(x2=1∣y=0,x1=1)=21, It is possible that the denominator is 1, Resulting in too few samples available . therefore ,AODE Add a threshold , The number of samples whose super parent feature is a specific value is required to be greater than or equal to m’ when , To use this SPODE Model .
- AODE The formula
P ( c ∣ x ) ∝ ∑ i = 1 d P ( c , x i ) ∏ j = 1 d P ( x j ∣ c , x i ) , ∣ D x i ∣ ≥ m ′ P(c|x)\propto \sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_i),|D_{x_i}|\geq m' P(c∣x)∝i=1∑dP(c,xi)j=1∏dP(xj∣c,xi),∣Dxi∣≥m′
- Parameter Solving
【 P ( c , x i ) P(c,x_i) P(c,xi)】
P ( c , x i ) = ∣ D c , x i ∣ + 1 ∣ D ∣ + N i P(c,x_i)=\frac{|D_{c,x_i}|+1}{|D|+N_i} P(c,xi)=∣D∣+Ni∣Dc,xi∣+1
【 P ( x j ∣ c , x i ) P(x_j|c,x_i) P(xj∣c,xi)】
P ( x j ∣ c , x i ) = ∣ D c , x i , x j ∣ + 1 ∣ D c , x i ∣ + N j P(x_j|c,x_i)=\frac{|D_{c,x_i,x_j}|+1}{|D_{c,x_i}|+N_j} P(xj∣c,xi)=∣Dc,xi∣+Nj∣Dc,xi,xj∣+1
2.4.4 TAN( Tree augmentation naive Bayes )
- introduction
Whether it's SPODE, still AOED, Although the superparent is transforming , But in every model Each feature depends on the superparent feature . In reality , The dependence of features is also unlikely to depend on one of them , It is Maybe each feature has different dependencies .TAN Can solve this problem , find Each feature is best suited to the other feature on which it depends .
- principle
stay Maximum weighted spanning tree Build dependencies on the basis of the algorithm .TAN Calculate the... Between two features Conditional mutual information , When selecting dependent features for each feature , Choose mutual information Maximum The corresponding characteristics of ( The value of conditional mutual information represents the degree of interdependence between the two features ).
- Algorithm

3. The extension of semi naive Bayesian classifier
3.1 kDE(k Dependency estimation )
Since it will Assumption of independence of attribute condition Relax as Dependent assumption It is possible to obtain the improvement of generalization performance , that , Can we consider the relationship between attributes Higher order dependencies ( Dependency on multiple attributes ) To further improve generalization performance ? amount to , take ODE Expand into kDE. if The training data is very sufficient , You can consider .
边栏推荐
- Sword finger offer II 091 Paint the house
- Feitian +cipu body brings more imagination to the metauniverse
- Gee: calculate the maximum and minimum values of pixels in the image area
- MySQL - database creation and management
- 威胁猎人必备的六个威胁追踪工具
- Unity——Mathf. Similarities and differences between atan and atan2
- Distributed ID generation system
- [serialization] how to master the core technology of opengauss database? Secret 5: master database security (6)
- Tiktok practice ~ sharing module ~ copy short video link
- The successfully resolved idea cannot use the log normally after referencing Lombok's @slf4j
猜你喜欢

抖音实战~首页视频~下拉刷新

关于Qt数据库开发的一些冷知识

Six necessary threat tracking tools for threat hunters

Boot指标监测

清华大学就光刻机发声,ASML立马加紧向中国出口光刻机

分布式ID生成系统

Good thing recommendation: mobile terminal development security tool

Database SQL statement writing

Selection of database paradigm and main code

Tiktok practice ~ search page ~ video details
随机推荐
Garbage collection mechanism of browser
数据库范式和主码的选择
Tiktok practice ~ search page ~ scan QR code
mysql存储过程
Jz-062- the k-th node of binary search tree
460million zongzi were sold in half a year. How big is the "imagination space" of time-honored brands?
Arduino uno + DS1302 uses 31 byte static RAM to store data and print through serial port
問題解决:虛擬機無法複制粘貼文件
windows系統下怎麼安裝mysql8.0數據庫?(圖文教程)
SentinelResource注解詳解
Selection of database paradigm and main code
Disruptor本地线程队列_使用transProcessor处理器和WorkPool两种方式进行消费对比---线程间通信工作笔记005
MongoDB实现创建删除数据库、创建删除表(集合)、数据增删改查
Some cold knowledge about QT database development
Is it safe to open an account for CICC Wealth Online?
论数据库的传统与未来之争之溯源溯本----AWS系列专栏
The two files are merged into a third file.
抖音实战~分享模块~短视频下载(保存到相册)
0 basic C language (0)
Stringutils judge whether the string is empty