当前位置:网站首页>[Bayesian classification 2] naive Bayesian classifier
[Bayesian classification 2] naive Bayesian classifier
2022-06-26 20:38:00 【NoBug ㅤ】
List of articles
1. Review of Bayesian decision theory
1.1 Classification principle
Classification principle . Y = { c 1 , c 2 , . . . , c N } Y = \{c_1, c_2, ..., c_N\} Y={ c1,c2,...,cN} Yes N Species marker , λ i j \lambda_{ij} λij It's marking a reality as c j c_j cj The samples were misclassified as c i c_i ci The loss caused . R ( c i ∣ x ) R(c_i|x) R(ci∣x) Yes sample x x x Classified as c i c_i ci The expected loss incurred ( Also called sample x x x The conditional risk of ). We know the wrong classification marks c i c_i ci, Our aim is to find the expected loss , And find the smallest . Define the formula : R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) . R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x). R(ci∣x)=∑j=1NλijP(cj∣x). for example : λ i j \lambda_{ij} λij They are samples { x 1 , x 2 , . . . , x j } \{x_1, x_2, ..., x_j\} { x1,x2,...,xj} Misclassification as c i c_i ci The loss caused . P ( c j ∣ x ) P(c_j|x) P(cj∣x) They are samples { x 1 , x 2 , . . . , x j } \{x_1, x_2, ..., x_j\} { x1,x2,...,xj} A probability corresponding to the occurrence of an event .
1.2 Bayesian classifier
Bayesian classifier . We use the above principle , Yes N There are three kinds of marks N The category can be divided into N class , { c 1 , c 2 , . . . , c N } \{c_1, c_2,..., c_N\} { c1,c2,...,cN} Collectively referred to as c c c .“ As long as each sample x x x The conditional risk is minimal , Just select the one on each sample that makes the conditional risk R ( c ∣ x ) R(c|x) R(c∣x) The smallest category mark , This is the Bayesian criterion ”. Bias optimal classifier : h ∗ ( x ) = a r g m i n c ∈ Y R ( c ∣ x ) h^*(x)=arg\ min_{c\in Y}R(c|x) h∗(x)=arg minc∈YR(c∣x).
1.3 P(c|x)
h ∗ ( x ) = a r g m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h∗(x)=arg maxc∈YP(c∣x). prove : Miscalculation of loss λ i j = { 0 , i = j 1 , Its He \lambda_{ij} = \left\{ \begin{array}{lr} 0 & ,i=j\\[6pt] 1 & , other \end{array} \right. λij={ 01,i=j, Its He , R ( c i ∣ x ) = ∑ j = 1 N λ i j P ( c j ∣ x ) R(c_i|x)=\sum_{j=1}^N\lambda_{ij}P(c_j|x) R(ci∣x)=∑j=1NλijP(cj∣x), R ( c i ∣ x ) = ∑ j = 1 N P ( c j ∣ x ) And i ≠ j R(c_i|x)=\sum_{j=1}^NP(c_j|x) And i\neq j R(ci∣x)=∑j=1NP(cj∣x) And i=j, ∑ j = 1 N P ( c j ∣ x ) = 1 \sum_{j=1}^NP(c_j|x)=1 ∑j=1NP(cj∣x)=1,… …, In the end R ( c ∣ x ) = 1 − P ( c ∣ x ) R(c_|x)=1-P(c|x) R(c∣x)=1−P(c∣x).
1.4 Calculation formula
Calculation formula . P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(c∣x)=P(x)P(c)P(x∣c), P ( c ) P(c) P(c) Is a class of prior probabilities , P ( x ∣ c ) P(x|c) P(x∣c) Is the sample x x x Relative to class tags c c c The class conditional probability of ( Also known as :“ likelihood ”), P ( x ) P(x) P(x) The tag is the same for all classes . transcendental P ( c ) P(c) P(c) Is the proportion of various samples in the sample space . likelihood P ( x ∣ c ) P(x|c) P(x∣c) Calculate by maximum likelihood estimation .
1.5 Maximum likelihood estimation
Maximum likelihood estimation . We have to solve P ( x ∣ c ) P(x|c) P(x∣c), D c D_c Dc Represents a training set D pass the civil examinations c Set of class samples , These samples are independent and identically distributed . There is a likelihood formula : P ( D c ∣ c ) = ∏ x ∈ D c P ( x ∣ c ) P(D_c|c)=\prod_{x\in D_c}P(x|c) P(Dc∣c)=∏x∈DcP(x∣c), Consecutive operation is easy to cause underflow , Generally, logarithm is used for processing . l o g P ( D c ∣ c ) logP(D_c|c) logP(Dc∣c). What is required is its maximum value .
2. Naive Bayesian classifier learning notes
2.1 introduction
h ∗ ( x ) = a r g m a x c ∈ Y P ( c ∣ x ) h^*(x)=arg\ max_{c\in Y}P(c|x) h∗(x)=arg maxc∈YP(c∣x), P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) P(c|x)=\frac{P(c)P(x|c)}{P(x)} P(c∣x)=P(x)P(c)P(x∣c), The main difficulty is to find the class conditional probability P ( x ∣ c ) P(x|c) P(x∣c), It is the joint probability of all attributes ( The calculation will encounter the problem of combined explosion ; Data will encounter missing values ). Naive Bayes classifier :“ For known categories , Suppose all attributes are independent of each other .( Assumption of independence of attribute condition )”
2.2 Knowledge cards
1. Naive Bayes classifier :Naive Bayes classifier
2. also called :NB Algorithm
2.3 Naive Bayes classifier
- Calculate the class conditional probability P ( x ∣ c ) P(x|c) P(x∣c)
d For the number of attributes
x i x_i xi by x x x In the i i i Values on attributes
P ( x ∣ c ) = ∏ i = 1 d P ( x i ∣ c ) P(x|c)=\prod_{i=1}^dP(x_i|c) P(x∣c)=i=1∏dP(xi∣c)
- Expression of naive Bayesian classifier
In the knowledge review , For all categories ,P(x) identical
h n b ( x ) = a r g m a x c ∈ Y P ( c ) ∏ i = 1 d P ( x i ∣ c ) h_{nb}(x)=arg\ max_{c\in Y}\ P(c)\prod_{i=1}^dP(x_i|c) hnb(x)=arg maxc∈Y P(c)i=1∏dP(xi∣c)
- Calculate the prior probability P ( c ) P(c) P(c)
P ( c ) = ∣ D c ∣ ∣ D ∣ P(c)=\frac{|D_c|}{|D|} P(c)=∣D∣∣Dc∣
- Calculate the class conditional probability P ( x i ∣ c ) P(x_i|c) P(xi∣c)
( discrete )
P ( x i ∣ c ) = ∣ D c , x i ∣ ∣ D c ∣ P(x_i|c)=\frac{|D_{c, x_i}|}{|D_c|} P(xi∣c)=∣Dc∣∣Dc,xi∣
( Continuity )
Yes On even To continue Belong to sex can Examination Consideration General rate The secret degree Letter Count \ \ \\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} For continuous attributes, the probability density function can be considered Yes On even To continue Belong to sex can Examination Consideration General rate The secret degree Letter Count
two spot 、 A few What 、 Two term 、 finger Count 、 Berth pine 、 just state branch cloth etc. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ {} At two o 'clock 、 The geometric 、 Binomial 、 Index 、 Poisson 、 Normal distribution, etc two spot 、 A few What 、 Two term 、 finger Count 、 Berth pine 、 just state branch cloth etc.
- Example
【 Data sets D】

【 Prior probability P ( c ) P(c) P(c)】
P ( good melon = yes ) = 8 17 ≈ 0.471 P( Good melon = yes )=\frac{8}{17}\approx0.471 P( good melon = yes )=178≈0.471
P ( good melon = no ) = 9 17 ≈ 0.529 P( Good melon = no )=\frac{9}{17}\approx0.529 P( good melon = no )=179≈0.529
【 Class conditional probability P ( x i ∣ c ) P(x_i|c) P(xi∣c)】

【 Naive Bayes classifier 】

because , 0.038 0.038 0.038 > 6.80 × 1 0 − 5 6.80\times10^{-5} 6.80×10−5
Park plain shellfish leaf Si branch class device take measuring try sample Ben " measuring 1 " sentence other by " good melon " Naive Bayes classifier will test samples " measuring 1" Judge as " Good melon " Park plain shellfish leaf Si branch class device take measuring try sample Ben " measuring 1" sentence other by " good melon "
2.4 Laplacian smoothing
Question elicitation
In practice, we will definitely encounter , such as : The attribute is “ Knock sound = Crisp ” There are 8 Samples , But this 8 The categories of samples are “ Good melon ” No . So there is : P clear brittle ∣ yes = P ( knock The sound = clear brittle ∣ good melon = yes ) = 0 8 = 0 P_{ Crisp | yes }=P( Knock sound = Crisp | Good melon = yes )=\frac{0}{8}=0 P clear brittle ∣ yes =P( knock The sound = clear brittle ∣ good melon = yes )=80=0, When it is treated as a tandem , Whatever other attributes of the sample are , Even on other attributes, it looks good , The result of classification will be “ Good melon = no ”, It's obviously not reasonable .Laplacian smoothing
Molecule plus one , Add to the denominator K,K Represents the number of categories .Example

3. Naive Bayesian classifier extension
3.1 Data processing
There are many ways to use naive Bayes classifier in real tasks .
for example , If the task requires high prediction speed , For a given training set , All probability estimates involved in naive Bayesian classifier can be calculated and stored in advance , In this way, the prediction can be made only by " Look up the table " Then we can judge ;
If task data changes frequently , You can use “ Lazy study ” (lazy learning) The way , No training at first , When the prediction request is received, the probability estimation is performed according to the current data set ;
If the data keeps growing , On the basis of the existing valuation , Incremental learning can be achieved only by counting and modifying the probability estimates involved in the attribute values of new samples .
3.2 Collect other information
【 The core of naive Bayesian classifier 】
h ∗ ( x ) = a r g m a x c ∈ Y P ( c ∣ x ) , P ( c ∣ x ) = P ( c ) P ( x ∣ c ) P ( x ) h^*(x)=arg\ max_{c\in Y}P(c|x),P(c|x)=\frac{P(c)P(x|c)}{P(x)} h∗(x)=arg maxc∈YP(c∣x),P(c∣x)=P(x)P(c)P(x∣c)
P ( class other ∣ , sign ) = P ( class other ) P ( , sign ∣ class other ) P ( , sign ) P( Category | features )=\frac{P( Category )P( features | Category )}{P( features )} P( class other ∣ , sign )=P( , sign )P( class other )P( , sign ∣ class other )
【 Advantages and disadvantages of naive Bayesian classifier 】
- " advantage :"
1. The algorithm logic is simple , Easy to implement
2. The cost of time and space is small in the process of classification
- " shortcoming :"
1. Because naive Bayesian models assume that attributes are independent of each other , This assumption is often not true in practical application .
2. When the number of attributes is large or the correlation between attributes is large , The classification effect is not good .
- " solve :"
1. about nb Disadvantages of the algorithm , There are algorithms like semi naive Bayes that are moderately improved by considering some correlations .
【 Bayesian network 】
边栏推荐
- Six necessary threat tracking tools for threat hunters
- [serialization] how to master the core technology of opengauss database? Secret 5: master database security (6)
- Kubernetes 资源拓扑感知调度优化
- Feitian +cipu body brings more imagination to the metauniverse
- 手机股票注册开户有没有什么风险?安全吗?
- [most detailed] the latest and complete redis interview (70)
- Daily basic use of alicloud personal image warehouse
- Flutter TextField详解
- Muke 8. Service fault tolerance Sentinel
- Review of watermelon book (VII): Bayesian classifier (manual push + code demo)
猜你喜欢

MySQL - subquery usage

论数据库的传统与未来之争之溯源溯本----AWS系列专栏

关于不等式取值转义的思路

分布式ID生成系统

QT两种方法实现定时器

Bonne Recommandation: développer des outils de sécurité pour les terminaux mobiles

Muke 8. Service fault tolerance Sentinel

windows系统下怎么安装mysql8.0数据库?(图文教程)

好物推薦:移動端開發安全工具

Development of NFT for digital collection platform
随机推荐
Review of watermelon book (VII): Bayesian classifier (manual push + code demo)
Database SQL statement writing
Tiktok practice ~ homepage video ~ pull-down refresh
Some cold knowledge about QT database development
515. 在每个树行中找最大值
证券开户安全吗,有没有什么危险呢
MySQL recharge
Basic and necessary common plug-ins of vscade
慕课8、服务容错-Sentinel
StringUtils判断字符串是否为空
C语言 文件光标 fseek
定长内存池
What are the specific steps for opening a stock account? Is it safe to open an account online?
开发者调查:Rust/PostgreSQL 最受喜爱,PHP 薪水偏低
710. 黑名单中的随机数
Is it safe to open an account for CICC Wealth Online?
C language simple login
Tiktok practice ~ search page ~ scan QR code
[most detailed] latest and complete redis interview (42 tracks)
JS mobile terminal touch screen event