当前位置:网站首页>[data mining] task 3: decision tree classification
[data mining] task 3: decision tree classification
2022-07-03 01:38:00 【zstar-_】
requirement
requirement : Weather factors include temperature 、 Humidity and wind , By giving data , Use decision tree algorithm to learn classification , Output a rule tree between exercise and inactivity and weather .
Training sets and test sets can be defined freely , In addition, temperature and humidity need to be generalized , Change the value into a general statement , For example, the temperature is hot , temperature , cool , Humidity becomes high , in .
from sklearn import tree
from sklearn.model_selection import train_test_split
import pandas as pd
import graphviz
import numpy as np
Data preprocessing
data fetch
df = pd.read_excel('data.xlsx', index_col=None)
df
The weather | temperature | humidity | The wind | motion | |
---|---|---|---|---|---|
0 | Fine | 85 | 85 | nothing | Not suitable for |
1 | Fine | 80 | 90 | Yes | Not suitable for |
2 | cloudy | 83 | 78 | nothing | fit |
3 | There is rain | 70 | 96 | nothing | fit |
4 | There is rain | 68 | 80 | nothing | fit |
5 | There is rain | 65 | 70 | Yes | Not suitable for |
6 | cloudy | 64 | 65 | Yes | fit |
7 | Fine | 72 | 95 | nothing | Not suitable for |
8 | Fine | 69 | 70 | nothing | fit |
9 | There is rain | 75 | 80 | nothing | fit |
10 | Fine | 75 | 70 | Yes | fit |
11 | cloudy | 72 | 90 | Yes | fit |
12 | cloudy | 81 | 75 | nothing | fit |
13 | There is rain | 71 | 80 | Yes | Not suitable for |
Quantitative text index
For the subsequent calculation of decision tree , We need to quantify the text index , Here's the conversion :
The weather —— Fine -0, multivariate -1, There is rain -2
The wind —— nothing -0, Yes -1
motion —— Not suitable for -0, fit -1
df[' The weather '] = df[' The weather '].replace(" Fine ", 0)
df[' The weather '] = df[' The weather '].replace(" cloudy ", 1)
df[' The weather '] = df[' The weather '].replace(" There is rain ", 2)
df[' The wind '] = df[' The wind '].replace(" nothing ", 0)
df[' The wind '] = df[' The wind '].replace(" Yes ", 1)
df[' motion '] = df[' motion '].replace(" Not suitable for ", 0)
df[' motion '] = df[' motion '].replace(" fit ", 1)
Temperature and humidity generalization
Subject requirements , Change the temperature and humidity value into a general statement . Here, the temperature and humidity are summarized and converted into numerical values , The specific rules are as follows :
temperature :<70- cool -0,70~80- temperature -1,>80- heat -2
humidity :>80- high -1,<=80- in -0
df[' temperature '] = np.where(df[' temperature '] < 70, 0, df[' temperature '])
df[' temperature '] = np.where((df[' temperature '] < 80) & (df[' temperature '] >= 70), 1, df[' temperature '])
df[' temperature '] = np.where(df[' temperature '] >= 80, 2, df[' temperature '])
df[' humidity '] = np.where(df[' humidity '] > 80, 1, 0)
The converted data is shown in the following table :
df
The weather | temperature | humidity | The wind | motion | |
---|---|---|---|---|---|
0 | 0 | 2 | 1 | 0 | 0 |
1 | 0 | 2 | 1 | 1 | 0 |
2 | 1 | 2 | 0 | 0 | 1 |
3 | 2 | 1 | 1 | 0 | 1 |
4 | 2 | 0 | 0 | 0 | 1 |
5 | 2 | 0 | 0 | 1 | 0 |
6 | 1 | 0 | 0 | 1 | 1 |
7 | 0 | 1 | 1 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 1 |
9 | 2 | 1 | 0 | 0 | 1 |
10 | 0 | 1 | 0 | 1 | 1 |
11 | 1 | 1 | 1 | 1 | 1 |
12 | 1 | 2 | 0 | 0 | 1 |
13 | 2 | 1 | 0 | 1 | 0 |
Data set partitioning
according to 7/3 The training set and test set are divided in proportion
data = df[[' The weather ', ' temperature ', ' humidity ', ' The wind ']]
target = df[' motion ']
data = np.array(data)
target = np.array(target)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(data, target, test_size=0.3)
Decision tree construction
Here, the standard of decision tree is Gini index , Finally, the classification accuracy is 60%
clf = tree.DecisionTreeClassifier(criterion="gini")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
print(score)
0.6
Visualization results
feature_name = [' The weather ', ' temperature ', ' humidity ', ' The wind ']
dot_data = tree.export_graphviz(clf, feature_names=feature_name, class_names=[" Not suitable for ", " fit "], filled=True, rounded=True
)
graph = graphviz.Source(dot_data.replace(
'helvetica', '"Microsoft YaHei"'), encoding='utf-8')
graph.view()
边栏推荐
- 【QT】自定义控件的封装
- 7-25 read numbers (loop switch)
- After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9
- Using tensorboard to visualize the model, data and training process
- 音程的知识的总结
- Wireshark data analysis and forensics a.pacapng
- [shutter] animation animation (the core class of shutter animation | animation | curvedanimation | animationcontroller | tween)
- [机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道
- Detailed explanation of Q-learning examples of reinforcement learning
- View of MySQL
猜你喜欢
Learn the five skills you need to master in cloud computing application development
CF1617B Madoka and the Elegant Gift、CF1654C Alice and the Cake、 CF1696C Fishingprince Plays With Arr
Leetcode skimming questions_ Sum of two numbers II - enter an ordered array
Leetcode 2097 - Legal rearrangement of pairs
[fh-gfsk] fh-gfsk signal analysis and blind demodulation research
C#应用程序界面开发基础——窗体控制(2)——MDI窗体
[technology development-23]: application of DSP in future converged networks
Force buckle 204 Count prime
什么是调。调的故事
Virtual list
随机推荐
STM32 - Application of external interrupt induction lamp
[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core
Leetcode 6103 - minimum fraction to delete an edge from the tree
Openresty cache
tail -f 、tail -F、tailf的区别
How is the mask effect achieved in the LPL ban/pick selection stage?
What operations need attention in the spot gold investment market?
SSL flood attack of DDoS attack
CF1617B Madoka and the Elegant Gift、CF1654C Alice and the Cake、 CF1696C Fishingprince Plays With Arr
MySQL - database query - basic query
Summary of interval knowledge
英语常用词汇
Leetcode 2097 - Legal rearrangement of pairs
Pytest learning notes (12) -allure feature · @allure Step () and allure attach
JUC thread scheduling
Mathematical knowledge: step Nim game game game theory
Test shift right: Elk practice of online quality monitoring
[North Asia data recovery] data recovery case of raid crash caused by hard disk disconnection during data synchronization of hot spare disk of RAID5 disk array
[shutter] animation animation (animatedwidget animation use process | create animation controller | create animation | create animatedwidget animation component | animation operation)
Learn the five skills you need to master in cloud computing application development