当前位置：网站首页>[Bayesian classification 2] naive Bayesian classifier

[Bayesian classification 2] naive Bayesian classifier

2022-06-26 20:38:00 【NoBug ㅤ】

List of articles

1. Review of Bayesian decision theory
2. Naive Bayesian classifier learning notes
3. Naive Bayesian classifier extension
- 3.1 Data processing
- 3.2 Collect other information

1. Review of Bayesian decision theory

1.1 Classification principle

Classification principle . $Y = \{c_1, c_2, ..., c_N\}$ Yes N Species marker , $\lambda_{ij}$ It's marking a reality as $c_j$ The samples were misclassified as $c_i$ The loss caused . $R(c_i|x)$ Yes sample $x$ Classified as $c_i$ The expected loss incurred （ Also called sample $x$ The conditional risk of ）. We know the wrong classification marks $c_i$ , Our aim is to find the expected loss , And find the smallest . Define the formula ： $R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x).$ for example : $\lambda_{ij}$ They are samples ${x_1, x_2, ..., x_j\}$ Misclassification as $c_i$ The loss caused . $P(c_j|x)$ They are samples ${x_1, x_2, ..., x_j\}$ A probability corresponding to the occurrence of an event .

1.2 Bayesian classifier

Bayesian classifier . We use the above principle , Yes N There are three kinds of marks N The category can be divided into N class , ${c_1, c_2,..., c_N\}$ Collectively referred to as $c$ .“ As long as each sample $x$ The conditional risk is minimal , Just select the one on each sample that makes the conditional risk $R (c ∣ x)$ The smallest category mark , This is the Bayesian criterion ”. Bias optimal classifier ： $h^*(x)=arg\ min_{c\in Y}R(c|x)$ .

1.3 P(c|x)

$h^*(x)=arg\ max_{c\in Y}P(c|x)$ . prove ： Miscalculation of loss $\lambda_{ij} = \left\{ \begin{array}{lr} 0 & ,i=j\\[6pt] 1 & , other \end{array} \right.$ , $R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x)$ , $R(c_i|x)=\sum_{j=1}^NP(c_j|x) And i\neq j$ , $\sum_{j=1}^NP(c_j|x)=1$ ,… …, In the end $R(c_|x)=1-P(c|x)$ .

1.4 Calculation formula

Calculation formula . $P(c|x)=\frac{P(c)P(x|c)}{P(x)}$ , $P (c)$ Is a class of prior probabilities , $P (x ∣ c)$ Is the sample $x$ Relative to class tags $c$ The class conditional probability of （ Also known as ：“ likelihood ”）, $P (x)$ The tag is the same for all classes . transcendental $P (c)$ Is the proportion of various samples in the sample space . likelihood $P (x ∣ c)$ Calculate by maximum likelihood estimation .

1.5 Maximum likelihood estimation

Maximum likelihood estimation . We have to solve $P (x ∣ c)$ , $D_c$ Represents a training set D pass the civil examinations c Set of class samples , These samples are independent and identically distributed . There is a likelihood formula ： $P(D_c|c)=\prod_{x\in D_c}P(x|c)$ , Consecutive operation is easy to cause underflow , Generally, logarithm is used for processing . $logP(D_c|c)$ . What is required is its maximum value .

2. Naive Bayesian classifier learning notes

2.1 introduction

$h^*(x)=arg\ max_{c\in Y}P(c|x)$ , $P(c|x)=\frac{P(c)P(x|c)}{P(x)}$ , The main difficulty is to find the class conditional probability $P (x ∣ c)$ , It is the joint probability of all attributes （ The calculation will encounter the problem of combined explosion ; Data will encounter missing values ）. Naive Bayes classifier ：“ For known categories , Suppose all attributes are independent of each other .（ Assumption of independence of attribute condition ）”

2.2 Knowledge cards

	1.  Naive Bayes classifier ：Naive Bayes classifier
	2.  also called ：NB Algorithm

2.3 Naive Bayes classifier

Calculate the class conditional probability $P (x ∣ c)$

d For the number of attributes
$x_i$ by $x$ In the $i$ Values on attributes
$P(x|c)=\prod_{i=1}^dP(x_i|c)$

Expression of naive Bayesian classifier

In the knowledge review , For all categories ,P(x) identical
$h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c)$

Calculate the prior probability $P (c)$

$P(c)=\frac{|D_c|}{|D|}$

Calculate the class conditional probability $P(x_i|c)$

（ discrete ）
$P(x_i|c)=\frac{|D_{c, x_i}|}{|D_c|}$

（ Continuity ）
$\ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} For continuous attributes, the probability density function can be considered$
$\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} At two o 'clock 、 The geometric 、 Binomial 、 Index 、 Poisson 、 Normal distribution, etc$

Example

【 Data sets D】

Insert picture description here
【 Prior probability $P (c)$ 】

$)=\frac{8}{17}\approx0.471$

$)=\frac{9}{17}\approx0.529$

【 Class conditional probability $P(x_i|c)$ 】

Insert picture description here

【 Naive Bayes classifier 】

Insert picture description here

because , $0.038$ > $6.80\times10^{-5}$

$Park plain shellfish leaf Si branch class device take measuring try sample Ben " measuring 1 " sentence other by " good melon "$

2.4 Laplacian smoothing

Question elicitation
In practice, we will definitely encounter , such as ： The attribute is “ Knock sound = Crisp ” There are 8 Samples , But this 8 The categories of samples are “ Good melon ” No . So there is ： $P_{ Crisp | yes }=P( Knock sound = Crisp | Good melon = yes )=\frac{0}{8}=0$ , When it is treated as a tandem , Whatever other attributes of the sample are , Even on other attributes, it looks good , The result of classification will be “ Good melon = no ”, It's obviously not reasonable .
Laplacian smoothing
Molecule plus one , Add to the denominator K,K Represents the number of categories .
Example

Insert picture description here

3. Naive Bayesian classifier extension

3.1 Data processing

There are many ways to use naive Bayes classifier in real tasks .
for example , If the task requires high prediction speed , For a given training set , All probability estimates involved in naive Bayesian classifier can be calculated and stored in advance , In this way, the prediction can be made only by " Look up the table " Then we can judge ;
If task data changes frequently , You can use “ Lazy study ” (lazy learning) The way , No training at first , When the prediction request is received, the probability estimation is performed according to the current data set ;
If the data keeps growing , On the basis of the existing valuation , Incremental learning can be achieved only by counting and modifying the probability estimates involved in the attribute values of new samples .

3.2 Collect other information

【 The core of naive Bayesian classifier 】

$h^*(x)=arg\ max_{c\in Y}P(c|x),P(c|x)=\frac{P(c)P(x|c)}{P(x)}$

$)=\frac{P( Category )P( features | Category )}{P( features )}$

【 Advantages and disadvantages of naive Bayesian classifier 】

- " advantage ："
	1.  The algorithm logic is simple , Easy to implement 
	2.  The cost of time and space is small in the process of classification 

- " shortcoming ："
	1.  Because naive Bayesian models assume that attributes are independent of each other , This assumption is often not true in practical application .
	2.  When the number of attributes is large or the correlation between attributes is large , The classification effect is not good .

- " solve ："
	1.  about nb Disadvantages of the algorithm , There are algorithms like semi naive Bayes that are moderately improved by considering some correlations .