当前位置:网站首页>【数据挖掘】任务3:决策树分类
【数据挖掘】任务3:决策树分类
2022-07-03 01:09:00 【zstar-_】
要求
要求:天气因素有温度、湿度和刮风等,通过给出数据,使用决策树算法学习分类,输出一个人是运动和不运动与天气之间的规则树。
训练集和测试集可以自由定义,另外需要对温度和湿度进行概化,将数值变为概括性表述,比如温度热,温,凉爽,湿度变为高,中。
from sklearn import tree
from sklearn.model_selection import train_test_split
import pandas as pd
import graphviz
import numpy as np
数据预处理
数据读取
df = pd.read_excel('data.xlsx', index_col=None)
df
天气 | 温度 | 湿度 | 风况 | 运动 | |
---|---|---|---|---|---|
0 | 晴 | 85 | 85 | 无 | 不适合 |
1 | 晴 | 80 | 90 | 有 | 不适合 |
2 | 多云 | 83 | 78 | 无 | 适合 |
3 | 有雨 | 70 | 96 | 无 | 适合 |
4 | 有雨 | 68 | 80 | 无 | 适合 |
5 | 有雨 | 65 | 70 | 有 | 不适合 |
6 | 多云 | 64 | 65 | 有 | 适合 |
7 | 晴 | 72 | 95 | 无 | 不适合 |
8 | 晴 | 69 | 70 | 无 | 适合 |
9 | 有雨 | 75 | 80 | 无 | 适合 |
10 | 晴 | 75 | 70 | 有 | 适合 |
11 | 多云 | 72 | 90 | 有 | 适合 |
12 | 多云 | 81 | 75 | 无 | 适合 |
13 | 有雨 | 71 | 80 | 有 | 不适合 |
文字指标量化
为了后续决策树的计算,需要把文字指标进行量化,下面进行转换:
天气——晴-0,多元-1,有雨-2
风况——无-0,有-1
运动——不适合-0,适合-1
df['天气'] = df['天气'].replace("晴", 0)
df['天气'] = df['天气'].replace("多云", 1)
df['天气'] = df['天气'].replace("有雨", 2)
df['风况'] = df['风况'].replace("无", 0)
df['风况'] = df['风况'].replace("有", 1)
df['运动'] = df['运动'].replace("不适合", 0)
df['运动'] = df['运动'].replace("适合", 1)
温湿度概化
题目要求,将温湿度数值变为概括性表述。这里将温湿度进行概述并转化为数值,具体规则如下:
温度:<70-凉爽-0,70~80-温-1,>80-热-2
湿度:>80-高-1,<=80-中-0
df['温度'] = np.where(df['温度'] < 70, 0, df['温度'])
df['温度'] = np.where((df['温度'] < 80) & (df['温度'] >= 70), 1, df['温度'])
df['温度'] = np.where(df['温度'] >= 80, 2, df['温度'])
df['湿度'] = np.where(df['湿度'] > 80, 1, 0)
转换后的数据如下表所示:
df
天气 | 温度 | 湿度 | 风况 | 运动 | |
---|---|---|---|---|---|
0 | 0 | 2 | 1 | 0 | 0 |
1 | 0 | 2 | 1 | 1 | 0 |
2 | 1 | 2 | 0 | 0 | 1 |
3 | 2 | 1 | 1 | 0 | 1 |
4 | 2 | 0 | 0 | 0 | 1 |
5 | 2 | 0 | 0 | 1 | 0 |
6 | 1 | 0 | 0 | 1 | 1 |
7 | 0 | 1 | 1 | 0 | 0 |
8 | 0 | 0 | 0 | 0 | 1 |
9 | 2 | 1 | 0 | 0 | 1 |
10 | 0 | 1 | 0 | 1 | 1 |
11 | 1 | 1 | 1 | 1 | 1 |
12 | 1 | 2 | 0 | 0 | 1 |
13 | 2 | 1 | 0 | 1 | 0 |
数据集划分
根据7/3的比例划分训练集和测试集
data = df[['天气', '温度', '湿度', '风况']]
target = df['运动']
data = np.array(data)
target = np.array(target)
Xtrain, Xtest, Ytrain, Ytest = train_test_split(data, target, test_size=0.3)
决策树构建
这里决策树的标准选择基尼指数,最终得到分类准确率为60%
clf = tree.DecisionTreeClassifier(criterion="gini")
clf = clf.fit(Xtrain, Ytrain)
score = clf.score(Xtest, Ytest)
print(score)
0.6
可视化结果
feature_name = ['天气', '温度', '湿度', '风况']
dot_data = tree.export_graphviz(clf, feature_names=feature_name, class_names=["不适合", "适合"], filled=True, rounded=True
)
graph = graphviz.Source(dot_data.replace(
'helvetica', '"Microsoft YaHei"'), encoding='utf-8')
graph.view()
边栏推荐
- MySQL - database query - condition query
- LDC Build Shared Library
- CF1617B Madoka and the Elegant Gift、CF1654C Alice and the Cake、 CF1696C Fishingprince Plays With Arr
- Do not log in or log in to solve the problem that the Oracle database account is locked.
- Top ten regular spot trading platforms 2022
- Wireshark data analysis and forensics a.pacapng
- 英语常用词汇
- MySQL - database query - basic query
- [FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core
- Makefile中wildcard、patsubst、notdir的含义
猜你喜欢
Machine learning terminology
MySQL --- 数据库查询 - 基本查询
Top ten regular spot trading platforms 2022
CF1617B Madoka and the Elegant Gift、CF1654C Alice and the Cake、 CF1696C Fishingprince Plays With Arr
看完这篇 教你玩转渗透测试靶机Vulnhub——DriftingBlues-9
【数据挖掘】任务6:DBSCAN聚类
[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道
Database SQL language 02 connection query
Wireshark data analysis and forensics a.pacapng
C application interface development foundation - form control (3) - file control
随机推荐
Tp6 fast installation uses mongodb to add, delete, modify and check
C语言课程信息管理系统
【系统分析师之路】第五章 复盘软件工程(开发模型开发方法)
C#应用程序界面开发基础——窗体控制(3)——文件类控件
【C语言】指针与数组笔试题详解
电信客户流失预测挑战赛
[shutter] animation animation (the core class of shutter animation | animation | curvedanimation | animationcontroller | tween)
The meaning of wildcard, patsubst and notdir in makefile
看疫情之下服装企业如何顺势而为
Summary of interval knowledge
[技术发展-23]:DSP在未来融合网络中的应用
JUC thread scheduling
C language course information management system
Button wizard play strange learning - automatic return to the city route judgment
Give you an array numbers that may have duplicate element values. It was originally an array arranged in ascending order, and it was rotated once according to the above situation. Please return the sm
Expérience de recherche d'emploi d'un programmeur difficile
Soft exam information system project manager_ Real topic over the years_ Wrong question set in the second half of 2019_ Morning comprehensive knowledge question - Senior Information System Project Man
[shutter] animation animation (basic process of shutter animation | create animation controller | create animation | set value listener | set state listener | use animation values in layout | animatio
ThinkPHP+Redis实现简单抽奖
Leetcode 2097 - Legal rearrangement of pairs