当前位置:网站首页>【数据挖掘】任务3:决策树分类
【数据挖掘】任务3:决策树分类
2022-07-03 01:09:00 【zstar-_】
要求
要求:天气因素有温度、湿度和刮风等,通过给出数据,使用决策树算法学习分类,输出一个人是运动和不运动与天气之间的规则树。
训练集和测试集可以自由定义,另外需要对温度和湿度进行概化,将数值变为概括性表述,比如温度热,温,凉爽,湿度变为高,中。
from sklearn import tree
from sklearn.model_selection import train_test_split
import pandas as pd
import graphviz
import numpy as np
数据预处理
数据读取
df = pd.read_excel('data.xlsx', index_col=None)
df
天气 | 温度 | 湿度 | 风况 | 运动 | |
---|---|---|---|---|---|
0 | 晴 | 85 | 85 | 无 | 不适合 |
1 | 晴 | 80 | 90 | 有 | 不适合 |
2 | 多云 | 83 | 78 | 无 | 适合 |
3 | 有雨 | 70 | 96 | 无 | 适合 |
4 | 有雨 | 68 | 80 | 无 | 适合 |
5 | 有雨 | 65 | 70 | 有 | 不适合 |
6 | 多云 | 64 | 65 | 有 | 适合 |
7 | 晴 | 72 | 95 | 无 | 不适合 |
8 | 晴 | 69 | 70 | 无 | 适合 |
9 | 有雨 | 75 | 80 | 无 | 适合 |
10 | 晴 | 75 | 70 | 有 | 适合 |
11 | 多云 | 72 | 90 | 有 | 适合 |
12 | 多云 | 81 | 75 | 无 | 适合 |
13 | 有雨 | 71 | 80 | 有 | 不适合 |
文字指标量化
为了后续决策树的计算,需要把文字指标进行量化,下面进行转换:
天气——晴-0,多元-1,有雨-2
风况——无-0,有-1
运动——不适合-0,适合-1
df['天气'] = df['天气'].replace("晴", 0)
df['天气'] = df['天气'].replace("多云", 1)
df['天气'] = df['天气'].replace("有雨", 2)
df['风况'] = df['风况'].replace("无", 0)
df['风况'] = df['风况'].replace("有", 1)
df['运动'] = df['运动'].replace("不适合", 0)
df['运动'] = df['运动'].replace("适合", 1)
温湿度概化
题目要求,将温湿度数值变为概括性表述。这里将温湿度进行概述并转化为数值,具体规则如下:
温度:<70-凉爽-0,70~80-温-1,>80-热-2
湿度:>80-高-1,<=80-中-0
df['温度'] = np.where(df['温度'] < 70, 0, df['温度'])
df['温度'] = np.where((df['温度'] < 80) & (df['温度'] >= 70), 1, df['温度'])
df['温度'] = np.where(df['温度'] >= 80, 2, df['温度'])
df['湿度'] = np.where(df['湿度'] > 80, 1, 0)
转换后的数据如下表所示:
df
天气 | 温度 | 湿度 | 风况 | 运动 | |
---|---|---|---|---|---|
0 | 0 | 2 | 1 | 0 | 0 |
1 | 0 | 2 | 1 | 1 | 0 |
2 | 1 | 2 | 0 | 0 | 1 |
3 | 2 | 1 | 1 | 0 | 1 |
4 | 2 | 0 | 0 | 0 | 1 |
5 | 2 | 0 | 0 | 1 | 0 |
6 | 1 | 0 | 0 | 1 | 1 |
7 | 0 | 1 | 1 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 1 |
9 | 2 | 1 | 0 | 0 | 1 |
10 | 0 | 1 | 0 | 1 | 1 |
11 | 1 | 1 | 1 | 1 | 1 |
12 | 1 | 2 | 0 | 0 | 1 |
13 | 2 | 1 | 0 | 1 | 0 |
数据集划分
根据7/3的比例划分训练集和测试集
data = df[['天气', '温度', '湿度', '风况']]
target = df['运动']
data = np.array(data)
target = np.array(target)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(data, target, test_size=0.3)
决策树构建
这里决策树的标准选择基尼指数,最终得到分类准确率为60%
clf = tree.DecisionTreeClassifier(criterion="gini")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
print(score)
0.6
可视化结果
feature_name = ['天气', '温度', '湿度', '风况']
dot_data = tree.export_graphviz(clf, feature_names=feature_name, class_names=["不适合", "适合"], filled=True, rounded=True
)
graph = graphviz.Source(dot_data.replace(
'helvetica', '"Microsoft YaHei"'), encoding='utf-8')
graph.view()
边栏推荐
- MySQL --- 数据库查询 - 条件查询
- 一位苦逼程序员的找工作经历
- Force buckle 204 Count prime
- 數學知識:臺階-Nim遊戲—博弈論
- Now that the teenager has returned, the world's fireworks are the most soothing and ordinary people return to work~
- C#应用程序界面开发基础——窗体控制(4)——选择类控件
- 【数据挖掘】任务4:20Newsgroups聚类
- Give you an array numbers that may have duplicate element values. It was originally an array arranged in ascending order, and it was rotated once according to the above situation. Please return the sm
- 产业互联网的产业范畴足够大 消费互联网时代仅是一个局限在互联网行业的存在
- [shutter] animation animation (the core class of shutter animation | animation | curvedanimation | animationcontroller | tween)
猜你喜欢
[技术发展-23]:DSP在未来融合网络中的应用
MySQL - database query - basic query
LeetCode 987. Vertical order transverse of a binary tree - Binary Tree Series Question 7
Daily topic: movement of haystack
看完这篇 教你玩转渗透测试靶机Vulnhub——DriftingBlues-9
Qtablewidget lazy load remaining memory, no card!
Leetcode skimming questions_ Sum of two numbers II - enter an ordered array
Why can't the start method be called repeatedly? But the run method can?
软考信息系统项目管理师_历年真题_2019下半年错题集_上午综合知识题---软考高级之信息系统项目管理师053
leetcode刷题_两数之和 II - 输入有序数组
随机推荐
C application interface development foundation - form control (4) - selection control
数学知识:台阶-Nim游戏—博弈论
leetcode 2097 — 合法重新排列数对
High-Resolution Network (篇一):原理刨析
GDB 在嵌入式中的相关概念
【我的OpenGL学习进阶之旅】关于欧拉角、旋转顺序、旋转矩阵、四元数等知识的整理
Concise analysis of redis source code 11 - Main IO threads and redis 6.0 multi IO threads
英语常用词汇
ThinkPHP+Redis实现简单抽奖
uniapp组件-uni-notice-bar通告栏
C语言课程信息管理系统
数学知识:Nim游戏—博弈论
leetcode 6103 — 从树中删除边的最小分数
[shutter] animation animation (basic process of shutter animation | create animation controller | create animation | set value listener | set state listener | use animation values in layout | animatio
After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9
Machine learning terminology
电信客户流失预测挑战赛
2022 coal mine gas drainage examination question bank and coal mine gas drainage examination questions and analysis
Uniapp component -uni notice bar notice bar
力扣 204. 计数质数