当前位置:网站首页>机器学习实战-决策树-22
机器学习实战-决策树-22
2022-07-28 11:51:00 【gemoumou】
机器学习实战-决策树-叶子分类

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
train = pd.read_csv('train.csv')

train.head()

train.shape

# 叶子类别数
len(train.species.unique())

Data Preparation
# 把字符串类别转化为数字形式
lb = LabelEncoder().fit(train.species)
labels = lb.transform(train.species)
# 去掉'species', 'id'的列
data = train.drop(['species', 'id'], axis=1)
data.head()


# 切分数据集
x_train,x_test,y_train,y_test = train_test_split(data, labels, test_size=0.3, stratify=labels)
建模分析
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)


模型优化
# max_depth:树的最大深度
# min_samples_split:内部节点再划分所需最小样本数
# min_samples_leaf:叶子节点最少样本数
param_grid = {
'max_depth': [30,40,50,60,70],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
# 网格搜索
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_train, y_train)

model.score(x_test, y_test)

决策树-动物分类

import pandas as pd
import numpy as np
# pip install missingno
import missingno as msno
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
data = pd.read_csv('zoo.csv')
data.head()

# 查看数据形状
data.shape

# 查看数据类型分布
data.dtypes

data.describe()

# 查看数据缺失情况
p=msno.bar(data)

# 画热力图,数值为两个变量之间的相关系数
plt.figure(figsize=(20,20))
p=sns.heatmap(data.corr(), annot=True, annot_kws = {
'fontsize' : 15 },square=True)

# 查看类别分布
pd.value_counts(data["class_type"])

# 获取训练数据和标签
x_data = data.drop(['animal_name', 'class_type'], axis=1)
y_data = data['class_type']
from sklearn.model_selection import train_test_split
# 切分数据集,stratify=y表示切分后训练集和测试集中的数据类型的比例跟切分前y中的比例一致
# 比如切分前y中0和1的比例为1:2,切分后y_train和y_test中0和1的比例也都是1:2
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data, test_size=0.3, stratify=y_data)
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)

tree.score(x_test, y_test)

模型优化
param_grid = {
'max_depth': [5,10,15,20,25],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_test, y_test)

param_grid = {
'max_depth': [8,9,10,11,12],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model2 = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model2.fit(x_train, y_train)
print(model2.best_estimator_)

边栏推荐
- MySQL总是安装不成功,这样处理就好啦
- Markdown concise grammar manual
- The 'name' attribute value associated with the element type 'item' cannot contain '& lt;' Character solution
- 试用copilot过程中问题解决
- What SaaS architecture design does a software architect need to know?
- New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held
- Using JSON in C language projects
- 新零售电商O2O模式解析
- Minimally invasive electrophysiology has passed the registration: a listed enterprise with annual revenue of 190million minimally invasive mass production
- FlexPro软件:生产、研究和开发中的测量数据分析
猜你喜欢

HC-05蓝牙模块调试从模式和主模式经历

leetcode:704二分查找

AVL树(平衡搜索树)

Did kafaka lose the message

Kafaka丢消息吗

FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be depreca

Open source huizhichuang future | 2022 open atom global open source summit openatom openeuler sub forum was successfully held

How to realize more multimedia functions through the ffmpeg library and NaPi mechanism integrated in openharmony system?
![[cute new problem solving] climb stairs](/img/a2/fd2f21c446562ba9f0359988d97efe.png)
[cute new problem solving] climb stairs

机器学习实战-神经网络-21
随机推荐
Come to tdengine Developer Conference and have an insight into the future trend of data technology development
Distributed session solution
Marketing play is changeable, and understanding the rules is the key!
VS1003 debugging routine
Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders
Leetcode 42. rainwater connection
Kafaka丢消息吗
Leetcode: array
Unity loads GLB model
New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held
GMT安装与使用
STM32 loopback structure receives and processes serial port data
Using JSON in C language projects
上位机和三菱FN2x通信实例
C# static的用法详解
Unity installs the device simulator
Merge table rows - three levels of for loop traversal data
设计一个线程池
05 pyechars 基本图表(示例代码+效果图)
Zadig v1.13.0 believes in the power of openness, and workflow connects all values