当前位置：网站首页>[reading notes] - 2 learn machine learning classification through naive Bayesian model

[reading notes] - 2 learn machine learning classification through naive Bayesian model

2022-07-28 17:59:00 【jsBeSelf】

《Python Machine Learning By Example》 Third Edition Chapter two notes

List of articles

1 Introduce

Adoption of this chapter , We can learn ：

What is machine learning classification ？ What kinds are there ？
Bayes theorem , Maximum posterior estimate , The mechanism of Bayesian classifier
Classification model evaluation
Fine tune the model

2 Machine learning classification

In fact, it is about learning the relationship between sample characteristics and target categories mapping .
Usually , There are three kinds of classification tasks ： Two classification , Many classification , Multi label classification .

Two classification ： seeing the name of a thing one thinks of its function , There are only two target categories , Classic examples are ： Spam filtering , Ads click on , Customer churn, etc .
Many classification ： Also called polynomial classification （multinomial classification）, There are more than two target categories , The most classic example is handwritten numeral recognition , It is also often used to evaluate the quality of classifiers .
Multi label classification ： It is easy to get confused with the concept of multi classification , But the difference is also obvious . For multi classification , An object or an image will only belong to one tag in the target category , But multiple tags means that you can include multiple in the target category , For example, the function of protein （ transport , antibody , Storage and so on ）, Categories of films （ Terror , adventure , Science fiction and so on ）.

There are many models that can be used to solve classification problems , Such as naive Bayes ,SVM, Decision tree , Logical regression and so on , The next part introduces the mechanism of naive Bayes .

3 Naive Bayes

simple ： In order to simplify probability calculation , When studying problems , Assume that the features are independent of each other . Bayes ： That is, it comes from Bayesian theorem .
Bayes theorem ：P(A∩B) = P(A)*P(B|A)=P(B)*P(A|B), The relationship between two conditional probabilities is obtained , After deformation, it becomes ：P(A|B)=P(B|A)*P(A)/P(B). In my understanding ,A Represents a sampling ,B Indicates an event , For example, the example of coin tossing in the article ： There is a fair coin and an unfair coin , The chances of picking them up are equal , Ask to get an unfair coin and throw it head What's the probability of ？ Inside A It means that in a coin selection （ sampling ） in , The probability of choosing an unfair coin , Here is 0.5, and B It means throwing head Events . because P(B) It is constant for any sampling , So it can usually be ignored （ If you really want to calculate , You need to use the full probability formula ）, Only consider P(A|B)∝P(B|A)*P(A), among ,P(A) by transcendental probability （prior）, It can be obtained by analyzing the label distribution in the data set , Considered an estimate of the world ,P(B|A) by likelihood （likelihood）, In fact, it can be simply understood as probability , rearwards ‘ Knowledge supplement ’ It will also be mentioned in ,P(A|B) by Posttest probability （posterior）, Is given observation data , Calculate the probability of which category .
An example of calculation is given , But there is no enumeration here （ lazy ）, There are many examples of calculation on the Internet . however , There's a caveat . When calculating likelihood , Sometimes because the sample data is relatively small , Under some categories , The value of a characteristic has never appeared , This results in a likelihood value of 0, Thus, the calculated posterior probability also roughly believes that the probability of this category is 0, This is not conducive to classification , So you need to use Laplacian smoothing ： Set a smoothing parameter , In each likelihood calculation , Molecules plus this smoothing parameter , Denominator plus total number of categories * This smoothing parameter , This problem can be solved .
In code programming , You can calculate the label distribution according to , Calculate the likelihood , Calculate the full probability , Finally, the order of calculating the posterior according to the test samples is realized , It can also be used. scikit-learn Library to implement ）

4 Classification model evaluation

There are many indicators for evaluating classification models ：Precision,Recall,F1 score,AUC wait .

Precision,Recall It involves the concept of adjoint matrix , From top to bottom , From left to right Ture Positive,True Negative,False Positive,False Negative（ not always ）.Precision by TP/（TP+FP）, That is, the correct proportion in the predicted positive example , So divide by the number of positive examples of all predictions ;Recall by TP/（TP+FN）, That is, in all positive examples , The predicted proportion , So divide by the number of all positive examples .
F1 score yes Precision,Recall The harmonic average of , namely F1 score = 2 * P * R / (P + R), If Precision,Recall All tall ,F1 score Just high .
AUC by ROC Area under curve . First of all ROC curve , Its full name is receiver operating characteristic（ Receiver operating characteristic curve ）,‘ Receiver operating characteristics ’ What does that mean ？ROC The horizontal axis in the curve is FPR（False Positive Rate, False positive rate ）, namely FP/（FP+TN）, Represents in the negative sample , The probability of being recognized as true , The vertical axis is TPR（True Positive Rate）, In fact, it is the front Recall. Because the probability given by the model is a continuous value , Different recipients have different ways of handling （ threshold ）, Some people think this probability is higher than 0.5, Namely True, Judging from his threshold , We can calculate the corresponding TPR and FPR, A point will be formed in the graph , And some people think it is higher than 0.8 It's just True, It can also form a point , If all the operations of infinite recipients are connected into a line , Namely ROC It's curved . It is commonly believed AUC by 0.7-0.8 To be beautiful ,0.8 The above is better .

5 Fine tune the model

The cross validation introduced in the previous chapter can be used to adjust the parameters of the model , The parameter that can be adjusted here is actually the smoothing coefficient in Laplace smoothing , Calculate under different coefficients ,AUC Average value , Re order AUC The parameter with the highest mean value .

Knowledge supplement

1） There are three kinds of naive Bayes , Used to deal with different distributions ：1.Gaussian Naïve Bayes： Deal with Gaussian distribution , That is, the eigenvalue is a continuous value , Assume that the values associated with each category follow a normal distribution ;2.Multinomial Naïve Bayes： Deal with multinomial distributions , That is, the eigenvalue is discrete , And there are more than two possible values ;3.Bernoulli Naïve Bayes： Deal with binomial distribution , That is, there are only two possibilities for eigenvalues .
2） Likelihood and probability are estimates and approximations for different contents respectively . probability ( density ) Express given θ Lower sample random vector X = x The possibility of , Likelihood expresses a given sample X = x Next parameter θ For the possibility of true value .
In my understanding , such as ： There are three people who may play alone in the nearby stadium , Everyone playing on the court may break the glass , So if the glass is not broken , At this time, the discussion is about the three of them breaking the glass probability （ Conditional probability ： Since this person is playing , And he broke the glass ）, If the glass breaks , The converse discussion is the possibility of each of them breaking , namely likelihood . Here, people are regarded as categories , The glass breaks into an observation sample .

原网站

版权声明
本文为[jsBeSelf]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/209/202207281633289455.html