当前位置：网站首页>[Bayesian classification 3] semi naive Bayesian classifier

[Bayesian classification 3] semi naive Bayesian classifier

2022-06-26 20:37:00 【NoBug ㅤ】

List of articles

1. Review of naive Bayesian classifier knowledge
2. Semi naive Bayesian classifier learning notes
3. The extension of semi naive Bayesian classifier
- 3.1 kDE(k Dependency estimation )

1. Review of naive Bayesian classifier knowledge

1.1 Category , features

We according to the Bayesian decision theory , Or rather, Bayesian classification principle , The first thing you get is a Expected loss 【 $R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x)$ 】. Bayesian criteria Is to minimize the overall risk , Thus, it can be pushed to the requirement that some risks be minimized 【 $h^*(x)=arg\ min_{c\in Y}R(c|x)$ 】.
You can put $c$ regard as “ Category ”, hold $x$ regard as “ features ”. $R (c ∣ x)$ Is a category , It has its own characteristics , Such as 【 Good melon ： Colour and lustre = dark green , texture = Clear ,…, density $\geq$ 0.697】, $c=\{ Good melon , Bad melon \}$ , $x=\{ dark green , Clear ,..., density \}$ , We calculate categories according to characteristics （ Right or not ） The risk of , The less risk , The more correct the representative category is .

1.2 risk , probability

according to Miscalculation of loss , We further derive $R(c_|x)=1-P(c|x)$ , $P (c ∣ x) = P (class other ∣, sign)$ It can be understood as ： This category , Through these characteristics , The emergence of （ Or the probability of an event ） probability . for example ：【 Good melon ： Colour and lustre = dark green , texture = Clear ,…, density $\geq$ 0.697】, We said ：【 Colour and lustre = dark green , texture = Clear ,…, density $\geq$ 0.697】 Is a good melon （ The probability is very high ）. So this is according to the characteristics , Constantly adjust the probability of categories , yes Posterior probability .
We naturally hope that , According to these characteristics, the greater the probability of being a good melon, the better , So there is ：【 $h^*(x)=arg\ max_{c\in Y}P(c|x)$ , There are several categories of datasets , One sample has several $P (c ∣ x)$ , According to these $P (c ∣ x)$ Size , Select the one with the largest value , So this sample is Be divided For this category .】. We demand $P (c ∣ x)$ This probability , There's a formula ：【 $P(c|x)=\frac{P(c)P(x|c)}{P(x)}$ 】, It's for all attributes joint probability （ In the actual , Will meet ： Combination explosion , Missing value, etc ）, difficulty In seeking Class conditional probability $P (x ∣ c)$ . We use naive Bayes to solve Posterior probability （ $P (c ∣ x)$ ） A difficult problem to calculate , Put forward 【 Assumption of independence of attribute condition 】（ Attributes can also be called features ）.

1.3 Class conditional probability

Naive Bayes classifier The expression of ：【 $h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c)$ 】, The focus is on Class conditional probability On , Basically, we use... In the calculation of data sets “ Reduced sample space method ”, Instead of using some formulas of conditional probability in probability theory . Simply speaking , Statistical samples Number , Do it again The proportion operation .

2. Semi naive Bayesian classifier learning notes

2.1 introduction

Naive Bayesian classifier uses “ Attribute independence assumption ”, It is difficult to establish in real tasks . therefore , People try to relax the assumption of attribute conditional independence to a certain extent , This leads to a class called " Semi naive Bayesian classifier " (semi-naïve Bayes classifiers) Learning methods of . The basic idea ：【 Due consideration should be given to the interdependent information among some attributes 】.

2.2 Knowledge cards

	1.  Semi naive Bayesian classifier ：semi-naive Bayes classifiers
	2.  Also called ：SNB Algorithm

2.3 Semi naive Bayesian classifier

Expression of semi naive Bayesian classifier

$pa_i$ Attribute $x_j$ Dependent properties , be called $x_j$ Parent attribute
$P(c|x)\propto P(c)\prod_{j=1}^dP(x_j|c,pa_{i})$

Rely solely on classifiers

The key to the problem is how to determine the value of each attribute Parent attribute , Different practices produce different Rely solely on classifiers .

2.4 Independent estimation

2.4.1 brief introduction

- " english ："
	One-Dependent Estimator, abbreviation ：ODE

- " purpose ："
	 It is a semi naive Bayesian classifier , Due consideration should be given to the interdependent information among some attributes , One of the most common strategies .

- "ODE principle ："
	 Suppose that each attribute depends on at most one other attribute outside the category 

- " Due consideration should be given to the interdependent information among some attributes :"
	 Consider different strategies , The independent dependent classifiers formed are also different . representative ：
	1）SPODE（Super-Parent ODE）
	2）TAN（Tree Augmented naive Bayes）
	3）AODE（Average ODE）

2.4.2 SPODE( Superparent dependent estimation )

principle

All features used depend on a single feature , This dependent feature is called Super parent （super-parent）. And then through Cross validation Identify superparent （ Assume that each attribute is a superparent , Choose the one that works best ）.

chart 1

Insert picture description here

Example

【 Data sets 】

$x_1$	$x_2$	$x_3$	$y$
1	1	1	1
1	0	0	1
1	1	1	1
1	0	0	0
1	1	1	0
0	0	0	0
0	1	1	0
0	1	0	1
0	1	1	0
0	0	0	0

【 forecast 】

$x_1$	$x_2$	$x_3$	$y$
1	1	0	?

【SPODE The formula 】

$P(c|x)\propto arg\ max_{c\in Y}\ P(c)\prod_{j=1}^dP(x_j|c,pa_{i})$

【 Prior probability $P (c)$ 】
$P(y=1)=\frac{4}{10}=0.4$

$P(y=0)=\frac{6}{10}=0.6$

【 Class conditional probability $P(x_i|c,pa_{i})$ [ Suppose first $x_1$ For superparent $pa_i=x_1$ ]】
$P(x_1=1|y=1)=\frac{3}{4}$

$P(x_2=1|y=1,x_1=1)=\frac{2}{3}$

$P(x_3=0|y=1,x_1=1)=\frac{1}{3}$

$P(x_1=1|y=0)=\frac{2}{6}=\frac{1}{3}$

$P(x_2=1|y=0,x_1=1)=\frac{1}{2}$

$P(x_3=0|y=0,x_1=1)=\frac{1}{2}$

【 Semi naive Bayesian classifier 】
$P(y=1)=0.4*0.75*\frac{2}{3}*\frac{1}{3}=0.067$

$P(y=1)=0.6*\frac{1}{3}*\frac{1}{2}*\frac{1}{2}=0.050$

$the With, pre measuring sample Ben class other, y = 1$

2.4.3 AODE( Average independent dependence estimation )

principle

SPODE It's choice A model To make predictions ,AODE It's based on Integrated learning The mechanism Rely solely on classifier . stay AODE in , Each model makes a prediction , And then The results averaged The final prediction result is obtained .(AODE Try to Each attribute Build as a superparent SPODE, And then I'm going to talk about those who have Enough training data Supported by SPODE Integrated as the end result .)

Understand the formula

$P(c|x)\propto \frac{\sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_{i})}{d}$

threshold $m^{'}$

For example, in the example above , $P(x_2=1|y=0,x_1=1)=\frac{1}{2}$ , It is possible that the denominator is 1, Resulting in too few samples available . therefore ,AODE Add a threshold , The number of samples whose super parent feature is a specific value is required to be greater than or equal to m’ when , To use this SPODE Model .

AODE The formula

$P(c|x)\propto \sum_{i=1}^dP(c,x_i)\prod_{j=1}^dP(x_j|c,x_i),|D_{x_i}|\geq m'$

Parameter Solving

【 $P(c,x_i)$ 】

$P(c,x_i)=\frac{|D_{c,x_i}|+1}{|D|+N_i}$

【 $P(x_j|c,x_i)$ 】
$P(x_j|c,x_i)=\frac{|D_{c,x_i,x_j}|+1}{|D_{c,x_i}|+N_j}$

2.4.4 TAN( Tree augmentation naive Bayes )

introduction

Whether it's SPODE, still AOED, Although the superparent is transforming , But in every model Each feature depends on the superparent feature . In reality , The dependence of features is also unlikely to depend on one of them , It is Maybe each feature has different dependencies .TAN Can solve this problem , find Each feature is best suited to the other feature on which it depends .

principle

stay Maximum weighted spanning tree Build dependencies on the basis of the algorithm .TAN Calculate the... Between two features Conditional mutual information , When selecting dependent features for each feature , Choose mutual information Maximum The corresponding characteristics of （ The value of conditional mutual information represents the degree of interdependence between the two features ）.

Algorithm

Insert picture description here

3. The extension of semi naive Bayesian classifier

3.1 kDE(k Dependency estimation )

Since it will Assumption of independence of attribute condition Relax as Dependent assumption It is possible to obtain the improvement of generalization performance , that , Can we consider the relationship between attributes Higher order dependencies （ Dependency on multiple attributes ） To further improve generalization performance ？ amount to , take ODE Expand into kDE. if The training data is very sufficient , You can consider .

原网站

版权声明
本文为[NoBug ㅤ]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/177/202206262021251503.html