当前位置:网站首页>[Bayesian classification 2] naive Bayesian classifier

[Bayesian classification 2] naive Bayesian classifier

2022-06-26 20:38:00 NoBug ㅤ

1. Review of Bayesian decision theory

1.1 Classification principle

   Classification principle . Y = { c 1 , c 2 , . . . , c N } Y = \{c_1, c_2, ..., c_N\} Y={ c1,c2,...,cN} Yes N Species marker , λ i j \lambda_{ij} λij It's marking a reality as c j c_j cj The samples were misclassified as c i c_i ci The loss caused . R ( c i ∣ x ) R(c_i|x) R(cix) Yes sample x x x Classified as c i c_i ci The expected loss incurred ( Also called sample x x x The conditional risk of ). We know the wrong classification marks c i c_i ci, Our aim is to find the expected loss , And find the smallest . Define the formula : R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) . R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x). R(cix)=j=1NλijP(cjx). for example : λ i j \lambda_{ij} λij They are samples { x 1 , x 2 , . . . , x j } \{x_1, x_2, ..., x_j\} { x1,x2,...,xj} Misclassification as c i c_i ci The loss caused . P ( c j ∣ x ) P(c_j|x) P(cjx) They are samples { x 1 , x 2 , . . . , x j } \{x_1, x_2, ..., x_j\} { x1,x2,...,xj} A probability corresponding to the occurrence of an event .

1.2 Bayesian classifier

   Bayesian classifier . We use the above principle , Yes N There are three kinds of marks N The category can be divided into N class , { c 1 , c 2 , . . . , c N } \{c_1, c_2,..., c_N\} { c1,c2,...,cN} Collectively referred to as c c c .“ As long as each sample x x x The conditional risk is minimal , Just select the one on each sample that makes the conditional risk R ( c ∣ x ) R(c|x) R(cx) The smallest category mark , This is the Bayesian criterion ”. Bias optimal classifier : h ∗ ( x ) = a r g   m i n c ∈ Y R ( c ∣ x ) h^*(x)=arg\ min_{c\in Y}R(c|x) h(x)=arg mincYR(cx).

1.3 P(c|x)

   h ∗ ( x ) = a r g   m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h(x)=arg maxcYP(cx). prove : Miscalculation of loss λ i j = { 0 , i = j 1 , Its He \lambda_{ij} = \left\{ \begin{array}{lr} 0 & ,i=j\\[6pt] 1 & , other \end{array} \right. λij={ 01,i=j, Its He , R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x) R(cix)=j=1NλijP(cjx), R ( c i ∣ x ) = ∑ j = 1 N P ( c j ∣ x ) And i ≠ j R(c_i|x)=\sum_{j=1}^NP(c_j|x) And i\neq j R(cix)=j=1NP(cjx) And i=j, ∑ j = 1 N P ( c j ∣ x ) = 1 \sum_{j=1}^NP(c_j|x)=1 j=1NP(cjx)=1,… …, In the end R ( c ∣ x ) = 1 − P ( c ∣ x ) R(c_|x)=1-P(c|x) R(cx)=1P(cx).

1.4 Calculation formula

   Calculation formula . P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(cx)=P(x)P(c)P(xc), P ( c ) P(c) P(c) Is a class of prior probabilities , P ( x ∣ c ) P(x|c) P(xc) Is the sample x x x Relative to class tags c c c The class conditional probability of ( Also known as :“ likelihood ”), P ( x ) P(x) P(x) The tag is the same for all classes . transcendental P ( c ) P(c) P(c) Is the proportion of various samples in the sample space . likelihood P ( x ∣ c ) P(x|c) P(xc) Calculate by maximum likelihood estimation .

1.5 Maximum likelihood estimation

   Maximum likelihood estimation . We have to solve P ( x ∣ c ) P(x|c) P(xc), D c D_c Dc Represents a training set D pass the civil examinations c Set of class samples , These samples are independent and identically distributed . There is a likelihood formula : P ( D c ∣ c ) = ∏ x ∈ D c P ( x ∣ c ) P(D_c|c)=\prod_{x\in D_c}P(x|c) P(Dcc)=xDcP(xc), Consecutive operation is easy to cause underflow , Generally, logarithm is used for processing . l o g P ( D c ∣ c ) logP(D_c|c) logP(Dcc). What is required is its maximum value .

2. Naive Bayesian classifier learning notes

2.1 introduction

   h ∗ ( x ) = a r g   m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h(x)=arg maxcYP(cx), P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(cx)=P(x)P(c)P(xc), The main difficulty is to find the class conditional probability P ( x ∣ c ) P(x|c) P(xc), It is the joint probability of all attributes ( The calculation will encounter the problem of combined explosion ; Data will encounter missing values ). Naive Bayes classifier :“ For known categories , Suppose all attributes are independent of each other .( Assumption of independence of attribute condition )”

2.2 Knowledge cards

	1.  Naive Bayes classifier :Naive Bayes classifier
	2.  also called :NB Algorithm  

2.3 Naive Bayes classifier

  • Calculate the class conditional probability P ( x ∣ c ) P(x|c) P(xc)

d For the number of attributes
x i x_i xi by x x x In the i i i Values on attributes
P ( x ∣ c ) = ∏ i = 1 d P ( x i ∣ c ) P(x|c)=\prod_{i=1}^dP(x_i|c) P(xc)=i=1dP(xic)

  • Expression of naive Bayesian classifier

In the knowledge review , For all categories ,P(x) identical
h n b ( x ) = a r g   m a x c ∈ Y   P ( c ) ∏ i = 1 d P ( x i ∣ c ) h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c) hnb(x)=arg maxcY P(c)i=1dP(xic)

  • Calculate the prior probability P ( c ) P(c) P(c)

P ( c ) = ∣ D c ∣ ∣ D ∣ P(c)=\frac{|D_c|}{|D|} P(c)=DDc

  • Calculate the class conditional probability P ( x i ∣ c ) P(x_i|c) P(xic)

( discrete )
P ( x i ∣ c ) = ∣ D c , x i ∣ ∣ D c ∣ P(x_i|c)=\frac{|D_{c, x_i}|}{|D_c|} P(xic)=DcDc,xi

( Continuity )
                                                 Yes On even To continue Belong to sex can Examination Consideration General rate The secret degree Letter Count \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} For continuous attributes, the probability density function can be considered                                                 Yes On even To continue Belong to sex can Examination Consideration General rate The secret degree Letter Count
                                   two spot 、 A few What 、 Two term 、 finger Count 、 Berth pine 、 just state branch cloth etc. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} At two o 'clock 、 The geometric 、 Binomial 、 Index 、 Poisson 、 Normal distribution, etc                                    two spot A few What Two term finger Count Berth pine just state branch cloth etc.

  • Example

【 Data sets D】

 Insert picture description here
【 Prior probability P ( c ) P(c) P(c)

P ( good melon = yes ) = 8 17 ≈ 0.471 P( Good melon = yes )=\frac{8}{17}\approx0.471 P( good melon = yes )=1780.471

P ( good melon = no ) = 9 17 ≈ 0.529 P( Good melon = no )=\frac{9}{17}\approx0.529 P( good melon = no )=1790.529

【 Class conditional probability P ( x i ∣ c ) P(x_i|c) P(xic)

 Insert picture description here

【 Naive Bayes classifier 】

 Insert picture description here

because , 0.038 0.038 0.038 > 6.80 × 1 0 − 5 6.80\times10^{-5} 6.80×105

Park plain shellfish leaf Si branch class device take measuring try sample Ben " measuring 1 " sentence other by " good melon " Naive Bayes classifier will test samples " measuring 1" Judge as " Good melon " Park plain shellfish leaf Si branch class device take measuring try sample Ben " measuring 1" sentence other by " good melon "

2.4 Laplacian smoothing

  • Question elicitation
       In practice, we will definitely encounter , such as : The attribute is “ Knock sound = Crisp ” There are 8 Samples , But this 8 The categories of samples are “ Good melon ” No . So there is : P clear brittle ∣ yes = P ( knock The sound = clear brittle ∣ good melon = yes ) = 0 8 = 0 P_{ Crisp | yes }=P( Knock sound = Crisp | Good melon = yes )=\frac{0}{8}=0 P clear brittle yes =P( knock The sound = clear brittle good melon = yes )=80=0, When it is treated as a tandem , Whatever other attributes of the sample are , Even on other attributes, it looks good , The result of classification will be “ Good melon = no ”, It's obviously not reasonable .

  • Laplacian smoothing
       Molecule plus one , Add to the denominator K,K Represents the number of categories .

  • Example

 Insert picture description here

3. Naive Bayesian classifier extension

3.1 Data processing

   There are many ways to use naive Bayes classifier in real tasks .
   for example , If the task requires high prediction speed , For a given training set , All probability estimates involved in naive Bayesian classifier can be calculated and stored in advance , In this way, the prediction can be made only by " Look up the table " Then we can judge ;
   If task data changes frequently , You can use “ Lazy study ” (lazy learning) The way , No training at first , When the prediction request is received, the probability estimation is performed according to the current data set ;
   If the data keeps growing , On the basis of the existing valuation , Incremental learning can be achieved only by counting and modifying the probability estimates involved in the attribute values of new samples .

3.2 Collect other information

【 The core of naive Bayesian classifier 】

h ∗ ( x ) = a r g   m a x c ∈ Y P ( c ∣ x ) , P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) h^*(x)=arg\ max_{c\in Y}P(c|x),P(c|x)=\frac{P(c)P(x|c)}{P(x)} h(x)=arg maxcYP(cx),P(cx)=P(x)P(c)P(xc)

P ( class other ∣ , sign ) = P ( class other ) P ( , sign ∣ class other ) P ( , sign ) P( Category | features )=\frac{P( Category )P( features | Category )}{P( features )} P( class other , sign )=P( , sign )P( class other )P( , sign class other )

【 Advantages and disadvantages of naive Bayesian classifier 】

- " advantage :"
	1.  The algorithm logic is simple , Easy to implement 
	2.  The cost of time and space is small in the process of classification 

- " shortcoming :"
	1.  Because naive Bayesian models assume that attributes are independent of each other , This assumption is often not true in practical application .
	2.  When the number of attributes is large or the correlation between attributes is large , The classification effect is not good .

- " solve :"
	1.  about nb Disadvantages of the algorithm , There are algorithms like semi naive Bayes that are moderately improved by considering some correlations .

【 Bayesian network 】

Bayesian network encyclopedia

原网站

版权声明
本文为[NoBug ㅤ]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/177/202206262021251574.html