当前位置:网站首页>[data mining] final review Chapter 3
[data mining] final review Chapter 3
2022-06-21 06:17:00 【A delicious little pig】
The third chapter classification
1. Definition of classification
Classification is to learn from a data set and construct a Prediction function The classification model of , Class label used to predict unknown samples , Such as : Predict whether the email is spam according to its title and content . Both classification and regression have the function of prediction , however : The output of classification prediction is discrete or nominal ; The output of regression prediction is continuous attribute value , for example : It is predicted that a bank customer will lose or not lose in the future , This is a classification task , Predict the total turnover of a shopping mall in the next year , This is the return mission .
2. Application fields of classification
At present, classification and regression methods have been widely used in all walks of life , Such as : In Finance , The classifier is used to predict the future direction of the stock . In medical diagnosis , Predict the diagnosis of the disease . In marketing , Use historical sales data , Predict whether certain goods can be sold 、 Predict which area the advertisement should be placed in .
3. General steps for classification
(1) Divide the data set into training set and test set ;
(2) Learn from the training set , Building a classification model ;( This model can be a decision tree or a classification rule )
(3) Use the classification model to classify the test set ; Evaluate the classification accuracy and other performance of the classification model ;
(4) The classification model with high classification accuracy is used to classify the future sample data with unknown class label .
4. Classification algorithm classification
classification method :
- Classification method based on decision tree
- Bayesian classification method
- Nearest neighbor classification method
- Neural network method
- Support vector machine, etc
Regression method :
- Linear regression
- Nonlinear regression
- Logical regression, etc
5. Decision tree classification algorithm
ID3、C4.5、CART etc.
6. ID3 Decision tree
ID3 Classification algorithm use Information gain As the selection criteria of attributes . Its basic idea is as follows : First, check all attributes , choice Maximum information gain Attribute generation for Decision tree node , Branches are established by different values of this attribute , Then recursively call the method to the subset of each branch to establish the branch of the decision tree node , Until all subsets contain only data of the same category . Finally, we get a decision tree , It can be used to classify new samples .
The definition of information entropy :
Calculate with probability 
Definition of information gain :
Before division - After division 
7. C4.5 Algorithm
characteristic :
- Able to handle Continuous type Attribute data and discrete attribute data
- Use Information gain rate As the attribute selection criteria of decision tree
Split information :

Information gain rate :

8. CART Algorithm
Gini coefficient :

Example :


answer :
边栏推荐
- Aurora8B10B IP使用 -02- IP功能設計技巧
- Dual tone search: array is incremented first and then decremented
- 微生物生态排序分析——CCA分析
- FPGA - 7 Series FPGA selectio -06- odelay of logic resources
- tf. compat. v1.get_ default_ graph
- Aurora8b10b IP use-04-ip routine application example
- Gpushare- members are coming online~
- C语言课程设计(服装管理系统详解)
- fastdfs集群
- DDD 实践手册(4. Aggregate — 聚合)
猜你喜欢

智能需要身体吗

MSF内网渗透

398 hash table (242. valid alphabetic words & 349. intersection of two arrays & 202. happy numbers)

Metasploit intrusion win7

Account1 is not in the sudoers file. The matter will be reported.

IP - 射频数据转换器 -04- API使用指南 - 系统设置相关函数

397 linked list (206. reverse linked list & 24. exchange nodes in the linked list in pairs & 19. delete the penultimate node of the linked list & interview question 02.07. link list intersection & 142

数字信号处理-07-DDS IP应用实例

Connection refused : no futher information : localhost/127.0.0.1:6379

sqli-labs26
随机推荐
contos7 安装svn服务端
lambda-stream
Aurora8B10B IP使用 -02- IP功能設計技巧
Some details of BN
DDD 实践手册(4. Aggregate — 聚合)
tf.compat.v1.get_default_graph
WordPress pseudo original tool - update website one click pseudo original publishing software
Chapter 2: Data Model (final review of database)
tf. compat. v1.pad
398 hash table (242. valid alphabetic words & 349. intersection of two arrays & 202. happy numbers)
tf.compat.v1.pad
sqli-labs25
【数据挖掘】期末复习 第三章
平衡二叉树详解 通俗易懂
第二章:数据模型(数据库期末复习)
智能需要身体吗
Aurora8B10B IP使用 -04- IP例程应用实例
pyshark使用教程
太厉害了MySQL总结的太全面了
【【毕业季·进击的技术er】------老学长心得分享