当前位置:网站首页>Naive Bayes classifier

Naive Bayes classifier

2022-06-09 07:10:00 Don't wait for brother shy to develop

1、 Classification concept

   Classification is to find out the model that describes and distinguishes data classes or concepts , In order to use the model Object class label with unknown prediction class label .

   Classification is generally divided into two stages :

  • Learning phase :

    • Create a classifier that describes a predefined data class or concept set .
    • The training set provides the class label of each training tuple , The learning process of classification is also called supervised learning .
  • Classification stage : The process of using a defined classifier for classification .

   Classification and prediction are different concepts , Classification is prediction classification ( discrete 、 disorder ) label , Numerical prediction is to establish a continuous value function model . Classification and category are also different concepts , Classification is supervised learning , Provides the class label of the training tuple ; Clustering is unsupervised learning , Do not rely on training instances with class labels .

2、 naive bayesian classification

2.1 Bayes theorem

   The formula of Bayes theorem is :
P ( h │ D ) = P ( D │ h ) P ( h ) P ( D ) P(ℎ│D)=\frac{P(D│ℎ)P(ℎ)}{P(D)} P(hD)=P(D)P(Dh)P(h)
   In style ,D Assume categories for the data to be tested , P ( h ∣ D ) P(h|D) P(hD) yes h Likelihood probability of , P ( h ) P(h) P(h) yes h The prior probability of , P ( h ∣ D ) P(h|D) P(hD) yes h The posterior probability of , P ( D ) P(D) P(D) yes D The prior probability of .

   Let's start with an example : In a school 60% Of boys (boy),40% The girl of (girl) . Boys always wear trousers (pants), Girls are half in trousers and half in skirts . Randomly select a student wearing trousers , He ( she ) What is the probability of being a girl ?

   The above description can be formalized as

   It is known that P(Boy)=60%, P(Girl)=40%, P(Pants|Girl)=50%,P(Pants|Boy)=100% seek :P(Girl|Pants)

   answer
P ( G i r l │ P a n t s ) = P ( G i r l ) P ( P a n t s │ G i r l ) P ( B o y ) P ( P a n t s ∣ B o y ) + P ( G i r l ) P ( P a n t s ∣ G i r l ) = P ( G i r l ) P ( P a n t s │ G i r l ) P ( P a n t s ) P(Girl│Pants)=\frac{P(Girl)P(Pants│Girl)}{P(Boy)P(Pants|Boy)+P(Girl)P(Pants|Girl)}=\frac{P(Girl)P(Pants│Girl)}{P(Pants)} P(GirlPants)=P(Boy)P(PantsBoy)+P(Girl)P(PantsGirl)P(Girl)P(PantsGirl)=P(Pants)P(Girl)P(PantsGirl)

Intuitive understanding : Figure out how many people wear trousers in school , Then figure out how many girls there are among these people

   For the above problems, we can get such observation knowledge : In a school 60% Of boys (boy),40% The girl of (girl) . Boys always wear trousers (pants), Girls are half in trousers and half in skirts . Again , We can't directly observe a randomly selected student wearing trousers , Determine whether the student is a boy or a girl .

   For parts that cannot be observed directly , Assumptions are often made . And for the uncertain things , There are often multiple assumptions .

image-20220608162715542

   Bayes provides a way to calculate the posterior probability of hypotheses P ( h ∣ D ) P(h|D) P(hD) Methods , That is, the posterior probability is proportional to the product of the prior probability and the likelihood probability .

2.2 Maximum a posteriori hypothesis

   The maximum a posteriori hypothesis learner is in the candidate hypothesis set H Find the given data in D The most likely assumption h,h It is called the maximum a posteriori hypothesis (Maximum a posteriori: MAP). determine MAP Bayes formula is used to calculate the posterior probability of each candidate hypothesis , The formula is as follows :

image-20220608163001960

image-20220608163006821

The last step is to remove P ( D ) P(D) P(D), Because it is not dependent on h The constant , Or consider that the prior probability of any data is equal .

2.3 Joint probability of multidimensional attributes

   It is known that : object D Is a vector composed of multiple attributes , Then combined with the above maximum a posteriori Hypothesis , Our goal can be written as :

image-20220608163158745

image-20220608163206416

image-20220608163219444

   But here comes a problem : Calculation P ( < a 1 , a 2 , … , a n > │ h ) P(<a_1,a_2,…,a_n>│ℎ) P(<a1,a2,,an>h) when , When the dimension is too high , Available data becomes sparse , Difficult to get results .

2.4 Independence hypothesis

   The previously mentioned problem of data sparsity can be solved by the assumption of independence , That's assuming D Properties of a i a_i ai Independent of each other , Then the above formula can be written as :

P ( < a 1 , a 2 , … , a n > │ h ) = ∏ i P ( a i ∣ h ) \begin{aligned} P(<a_1,a_2,…,a_n>│ℎ)=∏_iP(a_i|ℎ) \end{aligned} P(<a1,a2,,an>h)=iP(aih)

h M A P = max ⁡ h ∈ H P ( h ∣ < a 1 , a 2 , … , a n > ) = max ⁡ h ∈ H P ( < a 1 , a 2 , … , a n > │ h ) P ( h ) = max ⁡ h ∈ H ∐ i P ( a i │ h ) P ( h ) \begin{aligned} ℎ_{MAP}&=\max_{h\in H}P (ℎ|<a_1,a_2,…,a_n>)\\ &=\max_{h\in H} P(<a_1,a_2,…,a_n>│ℎ)P(ℎ) \\ &=\max_{h\in H} {\textstyle \coprod_{i}^{}P} (a_i│ℎ)P(ℎ) \end{aligned} hMAP=hHmaxP(h<a1,a2,,an>)=hHmaxP(<a1,a2,,an>h)P(h)=hHmaxiP(aih)P(h)

   After the assumption of Independence , Get an estimate of P ( a i │ h ) P(a_i│ℎ) P(aih) Than P ( < a 1 , a 2 , … , a n > │ h ) P(<a_1,a_2,…,a_n>│ℎ) P(<a1,a2,,an>h) It's a lot easier . If D The properties of are not mutually independent , The result of naive Bayesian classification is the approximation of Bayesian classification

3、 Bayesian classification case

   The following training set describes the statistics of computer purchase . The characteristics of the training set include age 、 income 、 hobby 、 Credit and purchases .

id Age income hobby credit Buy
1 green high no in no
2 green high no optimal no
3 in high no in yes
4 The old in no in yes
5 The old low yes in yes
6 The old low yes optimal no
7 in low yes optimal yes
8 green in no in no
9 green low yes in yes
10 The old in yes in yes
11 green in yes optimal yes
12 in in no optimal yes
13 in high yes in yes
14 The old in no optimal no

   Test cases : A middle-income 、 Young game lovers with good credit , Will you buy a computer ?

   Training set according to the above table , You can get the following training sets of purchased computers . For the following test sets , Judge a middle-income person 、 Whether the young game loving customers with good credit will buy computers .

id age group Income situation hobby Credit rating Buy a computer
3 in high no in yes
4 The old in no in yes
5 The old low yes in yes
7 in low yes optimal yes
9 green low yes in yes
10 The old in yes in yes
11 green in yes optimal yes
12 in in no optimal yes
13 in high yes in yes

   First, calculate the probabilities of different attributes among customers who purchase computers in the test set :
P ( green year ∣ purchase buy ) = 2 / 9 = 0.222 P ( closed Enter into in etc. ∣ purchase buy ) = 4 / 9 = 0.444 P ( Love good ∣ purchase buy ) = 6 / 9 = 0.667 P ( Letter use in ∣ purchase buy ) = 6 / 9 = 0.667 P( youth | Buy ) = 2/9 = 0.222\\ P( Middle income | Buy ) = 4/9 = 0.444\\ P( hobby | Buy ) = 6/9 = 0.667\\ P( Credit | Buy ) =6/9 = 0.667 P( green year purchase buy )=2/9=0.222P( closed Enter into in etc. purchase buy )=4/9=0.444P( Love good purchase buy )=6/9=0.667P( Letter use in purchase buy )=6/9=0.667
   Then according to the following formula , Calculate the likelihood probability of buying a computer :

image-20220608165001049
P ( X ∣ purchase buy ) = 0.222 × 0.444 × 0.667 × 0.667 = 0.044 P(X | Buy ) = 0.222 ×0.444 ×0.667 ×0.667=0.044 P(X purchase buy )=0.222×0.444×0.667×0.667=0.044
   Again , We can get training sets without buying a computer .

id age group Income situation hobby Credit rating Buy a computer
1 green high no in no
2 green high no optimal no
6 The old low yes optimal no
8 green in no in no
14 The old in no optimal no

   So the probability of not buying a computer under different attributes in the test set :
P ( green year ∣ No buy ) = 3 / 5 = 0.6 P ( closed Enter into in etc. ∣ No buy ) = 2 / 5 = 0.4 P ( Love good ∣ No buy ) = 1 / 5 = 0.2 P ( Letter use in ∣ No buy ) = 2 / 5 = 0.4 P( youth | Not buy ) = 3/5 = 0.6\\ P( Middle income | Not buy ) = 2/5 = 0.4\\ P( hobby | Not buy ) = 1/5 = 0.2\\ P( Credit | Not buy ) = 2/5 = 0.4 P( green year No buy )=3/5=0.6P( closed Enter into in etc. No buy )=2/5=0.4P( Love good No buy )=1/5=0.2P( Letter use in No buy )=2/5=0.4
   Again , Use the above formula to calculate the likelihood probability of not buying a computer :

image-20220608165134598
P ( X ∣ No buy ) = 0.6 × 0.4 × 0.2 × 0.4 = 0.019 P(X | Not buy ) =0.6 ×0.4 ×0.2 ×0.4=0.019 P(X No buy )=0.6×0.4×0.2×0.4=0.019
   Use formula P ( X ∣ C i ) P ( C i ) P(X|C_i)P(C_i) P(XCi)P(Ci), Available :
P ( C buy ) = 9 / 14 = 0.643 P ( C No buy ) = 5 / 14 = 0.357 P ( purchase buy ∣ X ) = 0.044 × 0.643 = 0.028 P ( No buy ∣ X ) = 0.019 × 0.357 = 0.007 P(C_ buy )=9/14=0.643\\ P(C_{ Not buy })=5/14=0.357\\ P( Buy |X) =0.044×0.643=0.028 \\ P( Not buy |X) = 0.019 ×0.357=0.007 P(C buy )=9/14=0.643P(C No buy )=5/14=0.357P( purchase buy X)=0.044×0.643=0.028P( No buy X)=0.019×0.357=0.007

4、 How to calculate the probability of continuous data

   The following table describes the results of whether to purchase computers under different income conditions . that , Can the data in the table be used to predict the income of 121, No game hobby 、 Will middle-aged people with good credit buy computers ?

id income Buy
1125 no
2100 no
370 no
4120 no
595 yes
660 no
7220 no
885 yes
975 no
1090 yes

   The revenue here is represented by continuous data , Therefore, the previous discrete data probability estimation method cannot be used . For continuous data , We assume that different types of income follow different normal distributions , Using parameters to estimate the expectation and variance of two groups of normal distribution , You can calculate the income as 121 The probability of not buying a computer , As shown below :

image-20220608170004717

image-20220608170235992

5、 The characteristics of naive Bayesian classifier

  • Attributes can be discrete 、 It can also be continuous
  • A solid foundation in mathematics 、 Classification efficiency is stable
  • Less sensitive to missing and noisy data
  • If the attribute is not related , The classification effect is very good

6、 Bayesian algorithm to achieve iris classification

6.1 Introduction to iris

 Preview the big picture

   Iris ( Latin scientific name :Iris L.), Monocotyledons , Iridaceae perennial herb , The flowers are big and beautiful , High ornamental value . Iris is about 300 Kind of , Iris The dataset contains three of these : Iris iris (Setosa), Variegated Iris (Versicolour), Iris Virginia (Virginica), Each of these 50 Data , Co inclusion 150 Data . Each data contains four attributes : Calyx length , Calyx width , Petal length , Petal width , These four attributes can be used to predict that iris flowers belong to ( Iris iris , Variegated Iris , Iris Virginia ) What kind of .

   Some data in the data set are shown in the figure below :

 Preview the big picture

6.2 Classification code

import sklearn
#  Import Gaussian naive Bayesian classifier 
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
import numpy as np
import pandas as pd


data_url = "Iris.csv"
df = pd.read_csv(data_url)
X = df.iloc[:,1:5]
y=df.iloc[:,5]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
#  Use Gaussian naive Bayes to calculate 
######## Begin ########
clf=GaussianNB()
######## End ########
clf.fit(X_train, y_train)
#  assessment 
y_pred = clf.predict(X_test)
acc = np.sum(y_test == y_pred) / X_test.shape[0]
print("Test Acc:%.3f" % acc)

image-20220608170522493

原网站

版权声明
本文为[Don't wait for brother shy to develop]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090700558781.html