当前位置:网站首页>[data mining] differences, advantages and disadvantages between generative model and discriminant model
[data mining] differences, advantages and disadvantages between generative model and discriminant model
2022-07-26 01:16:00 【Better Bench】

1 difference
(1) Discriminant model
Study P(x|y). It is to use a model or function to directly fit the probability distribution P(y|x), fitting P(x|y), It is fitting the relationship from fruit to cause , That is to say Y Under the condition of occurrence ,X Probability of occurrence , Corresponding to the actual training , It's based on label To train the model , Then judge the category , This fitted model is called discriminant model .
(2) Generate models
Study P(y|x). The formula is broken down into three parts P ( y ∣ x ) = P ( x ∣ y ) P ( y ) P ( x ) P(y|x)=\frac{P(x|y)P(y)}{P(x)} P(y∣x)=P(x)P(x∣y)P(y).
- P(x,y) It's the joint probability distribution , It's something to fit ,
- P(x) Express x Probability
- P(y) It can pass the label Ask directly
Then the process of generating the model is interpreted as fitting a probability distribution ( The essence is fitting P(x|y), because P(x,y)=P(x|y)P(y)), Then according to the maximum value in the probability distribution , To determine the type of data . fitting P(y|x), It is fitting the relationship from cause to effect , This fitted model is called generative model . To put it bluntly , The generation model can sample and generate data according to the joint probability distribution .
notes :P ( x | y ): It means that Y Under the condition of occurrence ,X Probability of occurrence .P ( x , y ): It's the joint probability distribution .P(x,y)=P(X=x and Y=y), It is also for X and Y Probability distribution of .
summary : Direct fitting P (x|y) It's the discriminant model . Direct fitting concept distribution P(y,x), Or indirect fitting P(y|x) Is the generation model .
2 give an example
(1) Common discriminant models
- K a near neighbor (KNN)
- Linear regression (Linear Regression)
- Logistic returns (Logistic Regression)
- neural network (NN)
- Support vector machine (SVM)
- Gauss process (Gaussian Process)
- Conditional random field (CRF)
- Categorizing regression trees CART(Classification and Regression Tree)
(2) Common generation models
- LDA Theme model
- Naive Bayes
- Gaussian mixture model
- The hidden Markov model (HMM)!
- Bayesian network
- Sigmoid Belief Networks
- Markov random Airport (Markov Random Fields)
- Deep belief network (DBN)
3 Advantages and disadvantages
(1) Generate models
advantage :
- Generation gives a joint distribution , Not only can the conditional probability distribution be calculated from the joint distribution , Other information can also be given , For example, you can use to calculate the edge probability distribution . If the edge distribution of an input sample is very small , Then it can be considered that the learned model may not be suitable for classifying this sample , The classification effect may be bad , It's also called outlier detection.
- The convergence speed of the generated model is relatively fast , That is, when the number of samples is large , The generated model can converge to the real model faster .
- The generation model can solve the problem of hidden variables , For example, Gaussian mixture model is the generation method with hidden variables .
shortcoming :
- Although joint distribution can provide more information , But it also needs more samples and more calculations . In order to estimate the conditional distribution of categories more accurately , Need to increase the number of samples , Moreover, many information about the conditional probability of categories is not available to us for classification , So if we only need to do classification tasks , It wastes computing resources .
- In practice, in most cases , No discrimination model works well .
(2) Discriminant model
advantage :
- Save computing resources , The number of samples required is also less than that of the generated model .
- The accuracy is often higher than that of the generated model .
- Because of direct learning , It is not necessary to solve the category conditional probability , So it allows us to abstract the input ( For example, dimensionality reduction 、 Structure, etc ), Thus, learning problems can be simplified .
shortcoming :
- There is no advantage of generating model .
边栏推荐
- Some abnormal error reports and precautions of flowable (1)
- JDBC connection database (idea version)
- 什么是信息化?什么是数字化?这两者有什么联系和区别?
- How can I become an irreplaceable programmer?
- Tutorial on principles and applications of database system (056) -- MySQL query (18): usage of other types of functions
- Docker高级篇-Mysql主从复制
- Suddenly found an optimization artifact
- Mulda: a multilingual data augmentation framework for low resource cross linguistic ner reading notes
- Inverse matrix block matrix
- 嵌入式开发:技巧和窍门——设计强大的引导加载程序的7个技巧
猜你喜欢
随机推荐
[RTOS training camp] I2C and UART knowledge and preview arrangement + evening class questions
如何获取广告服务流量变现数据,助力广告效果分析?
RHCE之at和crontab命令详解及chrony部署
Arthas watch 命令查看数组中对象的属性
Unity get the animation being played
Optimization of tableview
【Code】剑指offer 03数组中重复的数字
Lua基础语法
Spine_ Adnexal skin
Detailed explanation of at and crontab commands of RHCE and deployment of Chrony
U++学习笔记 UStruct、UEnum声明以及函数库简单函数实现
Four common simple and effective methods for changing IP addresses
NLP introduction + practice: Chapter 3: gradient descent and back propagation
Cross linguistic transfer of correlations between parts of speech and Gazette Features Reading Notes
【ICKIM 2022】第四届知识与信息管理国际会议
Case when of SQL
[Go]三、最简单的RestFul API服务器
The gym closes 8000 stores a year. How does the studio that bucks the trend and makes profits open?
MMOCR使用指南
# 浏览器开发使用技巧









