当前位置:网站首页>[reading notes] - 2 learn machine learning classification through naive Bayesian model
[reading notes] - 2 learn machine learning classification through naive Bayesian model
2022-07-28 17:59:00 【jsBeSelf】
《Python Machine Learning By Example》 Third Edition Chapter two notes
List of articles
1 Introduce
Adoption of this chapter , We can learn :
- What is machine learning classification ? What kinds are there ?
- Bayes theorem , Maximum posterior estimate , The mechanism of Bayesian classifier
- Classification model evaluation
- Fine tune the model
2 Machine learning classification
In fact, it is about learning the relationship between sample characteristics and target categories mapping .
Usually , There are three kinds of classification tasks : Two classification , Many classification , Multi label classification .
- Two classification : seeing the name of a thing one thinks of its function , There are only two target categories , Classic examples are : Spam filtering , Ads click on , Customer churn, etc .
- Many classification : Also called polynomial classification (multinomial classification), There are more than two target categories , The most classic example is handwritten numeral recognition , It is also often used to evaluate the quality of classifiers .
- Multi label classification : It is easy to get confused with the concept of multi classification , But the difference is also obvious . For multi classification , An object or an image will only belong to one tag in the target category , But multiple tags means that you can include multiple in the target category , For example, the function of protein ( transport , antibody , Storage and so on ), Categories of films ( Terror , adventure , Science fiction and so on ).
There are many models that can be used to solve classification problems , Such as naive Bayes ,SVM, Decision tree , Logical regression and so on , The next part introduces the mechanism of naive Bayes .
3 Naive Bayes
- simple : In order to simplify probability calculation , When studying problems , Assume that the features are independent of each other . Bayes : That is, it comes from Bayesian theorem .
- Bayes theorem :P(A∩B) = P(A)*P(B|A)=P(B)*P(A|B), The relationship between two conditional probabilities is obtained , After deformation, it becomes :P(A|B)=P(B|A)*P(A)/P(B). In my understanding ,A Represents a sampling ,B Indicates an event , For example, the example of coin tossing in the article : There is a fair coin and an unfair coin , The chances of picking them up are equal , Ask to get an unfair coin and throw it head What's the probability of ? Inside A It means that in a coin selection ( sampling ) in , The probability of choosing an unfair coin , Here is 0.5, and B It means throwing head Events . because P(B) It is constant for any sampling , So it can usually be ignored ( If you really want to calculate , You need to use the full probability formula ), Only consider P(A|B)∝P(B|A)*P(A), among ,P(A) by transcendental probability (prior), It can be obtained by analyzing the label distribution in the data set , Considered an estimate of the world ,P(B|A) by likelihood (likelihood), In fact, it can be simply understood as probability , rearwards ‘ Knowledge supplement ’ It will also be mentioned in ,P(A|B) by Posttest probability (posterior), Is given observation data , Calculate the probability of which category .
- An example of calculation is given , But there is no enumeration here ( lazy ), There are many examples of calculation on the Internet . however , There's a caveat . When calculating likelihood , Sometimes because the sample data is relatively small , Under some categories , The value of a characteristic has never appeared , This results in a likelihood value of 0, Thus, the calculated posterior probability also roughly believes that the probability of this category is 0, This is not conducive to classification , So you need to use Laplacian smoothing : Set a smoothing parameter , In each likelihood calculation , Molecules plus this smoothing parameter , Denominator plus total number of categories * This smoothing parameter , This problem can be solved .
- In code programming , You can calculate the label distribution according to , Calculate the likelihood , Calculate the full probability , Finally, the order of calculating the posterior according to the test samples is realized , It can also be used.
scikit-learnLibrary to implement )
4 Classification model evaluation
There are many indicators for evaluating classification models :Precision,Recall,F1 score,AUC wait .
- Precision,Recall It involves the concept of adjoint matrix , From top to bottom , From left to right Ture Positive,True Negative,False Positive,False Negative( not always ).Precision by TP/(TP+FP), That is, the correct proportion in the predicted positive example , So divide by the number of positive examples of all predictions ;Recall by TP/(TP+FN), That is, in all positive examples , The predicted proportion , So divide by the number of all positive examples .
- F1 score yes Precision,Recall The harmonic average of , namely F1 score = 2 * P * R / (P + R), If Precision,Recall All tall ,F1 score Just high .
- AUC by ROC Area under curve . First of all ROC curve , Its full name is receiver operating characteristic( Receiver operating characteristic curve ),‘ Receiver operating characteristics ’ What does that mean ?ROC The horizontal axis in the curve is FPR(False Positive Rate, False positive rate ), namely FP/(FP+TN), Represents in the negative sample , The probability of being recognized as true , The vertical axis is TPR(True Positive Rate), In fact, it is the front Recall. Because the probability given by the model is a continuous value , Different recipients have different ways of handling ( threshold ), Some people think this probability is higher than 0.5, Namely True, Judging from his threshold , We can calculate the corresponding TPR and FPR, A point will be formed in the graph , And some people think it is higher than 0.8 It's just True, It can also form a point , If all the operations of infinite recipients are connected into a line , Namely ROC It's curved . It is commonly believed AUC by 0.7-0.8 To be beautiful ,0.8 The above is better .
5 Fine tune the model
The cross validation introduced in the previous chapter can be used to adjust the parameters of the model , The parameter that can be adjusted here is actually the smoothing coefficient in Laplace smoothing , Calculate under different coefficients ,AUC Average value , Re order AUC The parameter with the highest mean value .
Knowledge supplement
1) There are three kinds of naive Bayes , Used to deal with different distributions :1.Gaussian Naïve Bayes: Deal with Gaussian distribution , That is, the eigenvalue is a continuous value , Assume that the values associated with each category follow a normal distribution ;2.Multinomial Naïve Bayes: Deal with multinomial distributions , That is, the eigenvalue is discrete , And there are more than two possible values ;3.Bernoulli Naïve Bayes: Deal with binomial distribution , That is, there are only two possibilities for eigenvalues .
2) Likelihood and probability are estimates and approximations for different contents respectively . probability ( density ) Express given θ Lower sample random vector X = x The possibility of , Likelihood expresses a given sample X = x Next parameter θ For the possibility of true value .
In my understanding , such as : There are three people who may play alone in the nearby stadium , Everyone playing on the court may break the glass , So if the glass is not broken , At this time, the discussion is about the three of them breaking the glass probability ( Conditional probability : Since this person is playing , And he broke the glass ), If the glass breaks , The converse discussion is the possibility of each of them breaking , namely likelihood . Here, people are regarded as categories , The glass breaks into an observation sample .
边栏推荐
- ps快速制作全屏水印
- Tensorflow2.0 (XI) -- understanding LSTM network
- 2.1 operator
- Domain name resolution problem record
- 把MySQL8的数据库备份导入MySQL5版本中
- [p5.js] practical exercise - irregular symmetry
- 其他电脑连接本地mysql
- mysql5.7压缩包安装教程
- How to bind idea with code cloud
- [unity tilemap] tutorial | basic, rule tile, prefab brush, tilemap Collider
猜你喜欢

数字滤波器(二)--最小相位延时系统和全通系统

多线程的使用

个人制作:AD库、元件库、封装库及3D模型,免费

Digital filter (I) -- basic structure and matlab implementation of IIR and fir

Openmv (VI) -- STM32 realizes object recognition and handwritten digit recognition

PS fast making full screen watermark

mmdetection3D---(1)

Understanding of virtual (virtual method) in C and its difference from abstract (abstract method)

关于localtion 下的root和alias的区别

OpenMV(四)--STM32实现特征检测
随机推荐
[unity] timeline learning notes (VII): Custom clip
移动端overflow失效问题
[p5.js] practical exercise - irregular symmetry
【无标题】
[machine learning notes] regularization: ridge regression
[advanced C language] - Advanced pointer [i]
Leetcode systematic question brushing (II) -- greed, backtracking, recursion
视频号、公众号间导流便捷可观
OpenMV(五)--STM32实现人脸识别
怎样将IDEA与码云进行绑定
临时url
有奖征文 | 2022 云原生编程挑战赛征稿活动开启!
数字滤波器(六)--设计FIR滤波器
数字滤波器(一)--IIR与FIR的基本结构与MATLAB实现
xcode打包ipa配置手动配置证书
word文档删除最后一页
[C language note sharing] custom type: structure, enumeration, Union (recommended Collection)
1.2-进制转换
[p5.js actual combat] my self portrait
视频号一场书法直播近20万人观看