当前位置:网站首页>[data mining] task 3: decision tree classification
[data mining] task 3: decision tree classification
2022-07-03 01:38:00 【zstar-_】
requirement
requirement : Weather factors include temperature 、 Humidity and wind , By giving data , Use decision tree algorithm to learn classification , Output a rule tree between exercise and inactivity and weather .
Training sets and test sets can be defined freely , In addition, temperature and humidity need to be generalized , Change the value into a general statement , For example, the temperature is hot , temperature , cool , Humidity becomes high , in .
from sklearn import tree
from sklearn.model_selection import train_test_split
import pandas as pd
import graphviz
import numpy as np
Data preprocessing
data fetch
df = pd.read_excel('data.xlsx', index_col=None)
df
| The weather | temperature | humidity | The wind | motion | |
|---|---|---|---|---|---|
| 0 | Fine | 85 | 85 | nothing | Not suitable for |
| 1 | Fine | 80 | 90 | Yes | Not suitable for |
| 2 | cloudy | 83 | 78 | nothing | fit |
| 3 | There is rain | 70 | 96 | nothing | fit |
| 4 | There is rain | 68 | 80 | nothing | fit |
| 5 | There is rain | 65 | 70 | Yes | Not suitable for |
| 6 | cloudy | 64 | 65 | Yes | fit |
| 7 | Fine | 72 | 95 | nothing | Not suitable for |
| 8 | Fine | 69 | 70 | nothing | fit |
| 9 | There is rain | 75 | 80 | nothing | fit |
| 10 | Fine | 75 | 70 | Yes | fit |
| 11 | cloudy | 72 | 90 | Yes | fit |
| 12 | cloudy | 81 | 75 | nothing | fit |
| 13 | There is rain | 71 | 80 | Yes | Not suitable for |
Quantitative text index
For the subsequent calculation of decision tree , We need to quantify the text index , Here's the conversion :
The weather —— Fine -0, multivariate -1, There is rain -2
The wind —— nothing -0, Yes -1
motion —— Not suitable for -0, fit -1
df[' The weather '] = df[' The weather '].replace(" Fine ", 0)
df[' The weather '] = df[' The weather '].replace(" cloudy ", 1)
df[' The weather '] = df[' The weather '].replace(" There is rain ", 2)
df[' The wind '] = df[' The wind '].replace(" nothing ", 0)
df[' The wind '] = df[' The wind '].replace(" Yes ", 1)
df[' motion '] = df[' motion '].replace(" Not suitable for ", 0)
df[' motion '] = df[' motion '].replace(" fit ", 1)
Temperature and humidity generalization
Subject requirements , Change the temperature and humidity value into a general statement . Here, the temperature and humidity are summarized and converted into numerical values , The specific rules are as follows :
temperature :<70- cool -0,70~80- temperature -1,>80- heat -2
humidity :>80- high -1,<=80- in -0
df[' temperature '] = np.where(df[' temperature '] < 70, 0, df[' temperature '])
df[' temperature '] = np.where((df[' temperature '] < 80) & (df[' temperature '] >= 70), 1, df[' temperature '])
df[' temperature '] = np.where(df[' temperature '] >= 80, 2, df[' temperature '])
df[' humidity '] = np.where(df[' humidity '] > 80, 1, 0)
The converted data is shown in the following table :
df
| The weather | temperature | humidity | The wind | motion | |
|---|---|---|---|---|---|
| 0 | 0 | 2 | 1 | 0 | 0 |
| 1 | 0 | 2 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0 | 0 | 1 |
| 3 | 2 | 1 | 1 | 0 | 1 |
| 4 | 2 | 0 | 0 | 0 | 1 |
| 5 | 2 | 0 | 0 | 1 | 0 |
| 6 | 1 | 0 | 0 | 1 | 1 |
| 7 | 0 | 1 | 1 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 1 |
| 9 | 2 | 1 | 0 | 0 | 1 |
| 10 | 0 | 1 | 0 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 |
| 12 | 1 | 2 | 0 | 0 | 1 |
| 13 | 2 | 1 | 0 | 1 | 0 |
Data set partitioning
according to 7/3 The training set and test set are divided in proportion
data = df[[' The weather ', ' temperature ', ' humidity ', ' The wind ']]
target = df[' motion ']
data = np.array(data)
target = np.array(target)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(data, target, test_size=0.3)
Decision tree construction
Here, the standard of decision tree is Gini index , Finally, the classification accuracy is 60%
clf = tree.DecisionTreeClassifier(criterion="gini")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
print(score)
0.6
Visualization results
feature_name = [' The weather ', ' temperature ', ' humidity ', ' The wind ']
dot_data = tree.export_graphviz(clf, feature_names=feature_name, class_names=[" Not suitable for ", " fit "], filled=True, rounded=True
)
graph = graphviz.Source(dot_data.replace(
'helvetica', '"Microsoft YaHei"'), encoding='utf-8')
graph.view()

边栏推荐
- 2022 cable crane driver examination registration and cable crane driver certificate examination
- Why can't the start method be called repeatedly? But the run method can?
- 【数据挖掘】任务1:距离计算
- [technology development-23]: application of DSP in future converged networks
- Meituan dynamic thread pool practice ideas, open source
- 数学知识:Nim游戏—博弈论
- Give you an array numbers that may have duplicate element values. It was originally an array arranged in ascending order, and it was rotated once according to the above situation. Please return the sm
- Qtablewidget lazy load remaining memory, no card!
- ThinkPHP+Redis实现简单抽奖
- Mathematical Knowledge: Steps - Nim Games - Game Theory
猜你喜欢

Leetcode 2097 - Legal rearrangement of pairs

【面试题】1369- 什么时候不能使用箭头函数?

MySQL --- 数据库查询 - 基本查询

Take you ten days to easily complete the go micro service series (I)

软考信息系统项目管理师_历年真题_2019下半年错题集_上午综合知识题---软考高级之信息系统项目管理师053

Three core issues of concurrent programming - "deep understanding of high concurrent programming"
![[shutter] animation animation (basic process of shutter animation | create animation controller | create animation | set value listener | set state listener | use animation values in layout | animatio](/img/70/54eb9359ac91aa43383b240eb036b7.gif)
[shutter] animation animation (basic process of shutter animation | create animation controller | create animation | set value listener | set state listener | use animation values in layout | animatio

【数据挖掘】任务1:距离计算

High resolution network (Part 1): Principle Analysis

Androd Gradle 对其使用模块依赖的替换
随机推荐
[data mining] task 6: DBSCAN clustering
数学知识:台阶-Nim游戏—博弈论
Type expansion of non ts/js file modules
LeetCode 987. Vertical order transverse of a binary tree - Binary Tree Series Question 7
网络安全-信息收集
C application interface development foundation - form control (3) - file control
[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道
Qtablewidget lazy load remaining memory, no card!
High-Resolution Network (篇一):原理刨析
Why can't the start method be called repeatedly? But the run method can?
Concise analysis of redis source code 11 - Main IO threads and redis 6.0 multi IO threads
tp6快速安装使用MongoDB实现增删改查
[my advanced journey of OpenGL learning] collation of Euler angle, rotation order, rotation matrix, quaternion and other knowledge
Tp6 fast installation uses mongodb to add, delete, modify and check
[Cao gongzatan] after working in goose factory for a year in 2021, some of my insights
Top ten regular spot trading platforms 2022
uniapp组件-uni-notice-bar通告栏
Meituan dynamic thread pool practice ideas, open source
View of MySQL
传输层 TCP主要特点和TCP连接