当前位置:网站首页>机器学习实战-决策树-22
机器学习实战-决策树-22
2022-07-28 11:51:00 【gemoumou】
机器学习实战-决策树-叶子分类

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
train = pd.read_csv('train.csv')

train.head()

train.shape

# 叶子类别数
len(train.species.unique())

Data Preparation
# 把字符串类别转化为数字形式
lb = LabelEncoder().fit(train.species)
labels = lb.transform(train.species)
# 去掉'species', 'id'的列
data = train.drop(['species', 'id'], axis=1)
data.head()


# 切分数据集
x_train,x_test,y_train,y_test = train_test_split(data, labels, test_size=0.3, stratify=labels)
建模分析
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)


模型优化
# max_depth:树的最大深度
# min_samples_split:内部节点再划分所需最小样本数
# min_samples_leaf:叶子节点最少样本数
param_grid = {
'max_depth': [30,40,50,60,70],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
# 网格搜索
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_train, y_train)

model.score(x_test, y_test)

决策树-动物分类

import pandas as pd
import numpy as np
# pip install missingno
import missingno as msno
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
data = pd.read_csv('zoo.csv')
data.head()

# 查看数据形状
data.shape

# 查看数据类型分布
data.dtypes

data.describe()

# 查看数据缺失情况
p=msno.bar(data)

# 画热力图,数值为两个变量之间的相关系数
plt.figure(figsize=(20,20))
p=sns.heatmap(data.corr(), annot=True, annot_kws = {
'fontsize' : 15 },square=True)

# 查看类别分布
pd.value_counts(data["class_type"])

# 获取训练数据和标签
x_data = data.drop(['animal_name', 'class_type'], axis=1)
y_data = data['class_type']
from sklearn.model_selection import train_test_split
# 切分数据集,stratify=y表示切分后训练集和测试集中的数据类型的比例跟切分前y中的比例一致
# 比如切分前y中0和1的比例为1:2,切分后y_train和y_test中0和1的比例也都是1:2
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data, test_size=0.3, stratify=y_data)
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)

tree.score(x_test, y_test)

模型优化
param_grid = {
'max_depth': [5,10,15,20,25],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_test, y_test)

param_grid = {
'max_depth': [8,9,10,11,12],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model2 = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model2.fit(x_train, y_train)
print(model2.best_estimator_)

边栏推荐
- Holes in [apue] files
- Ten prohibitions for men and women in love
- 第九章 REST 服务安全
- CCF201912-2 回收站选址
- Minimally invasive electrophysiology has passed the registration: a listed enterprise with annual revenue of 190million minimally invasive mass production
- C# 泛型是什么、泛型缓存、泛型约束
- 力扣315计算右侧小于当前元素的个数
- Ccf201912-2 recycling station site selection
- 试用copilot过程中问题解决
- 西门子对接Leuze BPS_304i 笔记
猜你喜欢

Linear classifier (ccf20200901)

卸载 Navicat:正版 MySQL 官方客户端,真香!

连通块&&食物链——(并查集小结)

新零售电商O2O模式解析

VS code更新后不在原来位置

Initialization examples of several modes of mma8452q

How to realize more multimedia functions through the ffmpeg library and NaPi mechanism integrated in openharmony system?

Newly released, the domestic ide developed by Alibaba is completely open source
![[cute new problem solving] climb stairs](/img/a2/fd2f21c446562ba9f0359988d97efe.png)
[cute new problem solving] climb stairs

Is it overtime to be on duty? Take up legal weapons to protect your legitimate rights and interests. It's time to rectify the working environment
随机推荐
VS1003 debugging routine
Hc-05 Bluetooth module debugging slave mode and master mode experience
20220728 common methods of object class
HC-05蓝牙模块调试从模式和主模式经历
VS code更新后不在原来位置
How to realize more multimedia functions through the ffmpeg library and NaPi mechanism integrated in openharmony system?
归并排序
Unity loads GLB model
恋爱男女十禁
Did kafaka lose the message
用C语言开发NES游戏(CC65)03、VRAM缓冲区
Communication example between upper computer and Mitsubishi fn2x
Linear classifier (ccf20200901)
新东方单季营收5.24亿美元同比降56.8% 学习中心减少925间
MySQL limit paging optimization
Rolling update strategy of deployment.
The openatom openharmony sub forum was successfully held, and ecological and industrial development entered a new journey
Jinshanyun rushes to the dual main listing of Hong Kong stocks: the annual revenue of 9billion is a project supported by Lei Jun
05 pyechars 基本图表(示例代码+效果图)
Minimally invasive electrophysiology has passed the registration: a listed enterprise with annual revenue of 190million minimally invasive mass production