当前位置:网站首页>Machine learning practice - decision tree-22
Machine learning practice - decision tree-22
2022-07-28 12:53:00 【gemoumou】
Machine learning practice - Decision tree - Leaf classification

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
train = pd.read_csv('train.csv')

train.head()

train.shape

# Number of leaf categories
len(train.species.unique())

Data Preparation
# Convert string categories to numeric form
lb = LabelEncoder().fit(train.species)
labels = lb.transform(train.species)
# Get rid of 'species', 'id' The column of
data = train.drop(['species', 'id'], axis=1)
data.head()


# Sharding data sets
x_train,x_test,y_train,y_test = train_test_split(data, labels, test_size=0.3, stratify=labels)
Modeling analysis
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)


Model optimization
# max_depth: The maximum depth of the tree
# min_samples_split: Minimum number of samples required for internal node subdivision
# min_samples_leaf: Minimum number of leaf nodes
param_grid = {
'max_depth': [30,40,50,60,70],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
# The grid search
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_train, y_train)

model.score(x_test, y_test)

Decision tree - Animal classification

import pandas as pd
import numpy as np
# pip install missingno
import missingno as msno
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
data = pd.read_csv('zoo.csv')
data.head()

# View data shapes
data.shape

# View the distribution of data types
data.dtypes

data.describe()

# Check for missing data
p=msno.bar(data)

# Draw a heat map , The value is the correlation coefficient between the two variables
plt.figure(figsize=(20,20))
p=sns.heatmap(data.corr(), annot=True, annot_kws = {
'fontsize' : 15 },square=True)

# View category distribution
pd.value_counts(data["class_type"])

# Get training data and tags
x_data = data.drop(['animal_name', 'class_type'], axis=1)
y_data = data['class_type']
from sklearn.model_selection import train_test_split
# Sharding data sets ,stratify=y Represents the ratio of data types in the training set and test set after segmentation to that before segmentation y The proportion is the same
# For example, before segmentation y in 0 and 1 The proportion of 1:2, After cutting y_train and y_test in 0 and 1 The proportion is also 1:2
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data, test_size=0.3, stratify=y_data)
tree = DecisionTreeClassifier()
tree.fit(x_train, y_train)

tree.score(x_test, y_test)

Model optimization
param_grid = {
'max_depth': [5,10,15,20,25],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model.fit(x_train, y_train)
print(model.best_estimator_)

model.score(x_test, y_test)

param_grid = {
'max_depth': [8,9,10,11,12],
'min_samples_split': [2,3,4,5,6],
'min_samples_leaf':[1,2,3,4]}
model2 = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=3, iid=True)
model2.fit(x_train, y_train)
print(model2.best_estimator_)

边栏推荐
- 一台电脑上 多个项目公用一个 公私钥对拉取gerrit服务器代码
- Communication example between upper computer and Mitsubishi fn2x
- The largest rectangle in leetcode84 histogram
- The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
- Machine learning practice - integrated learning-23
- New progress in the implementation of the industry | the openatom openharmony sub forum of the 2022 open atom global open source summit was successfully held
- 04 pyechars 地理图表(示例代码+效果图)
- Cloud native - runtime environment
- 与元素类型 “item” 相关联的 “name” 属性值不能包含'<” 字符解决办法
- HC-05蓝牙模块调试从模式和主模式经历
猜你喜欢

机器学习基础-主成分分析PCA-16

Jinshanyun rushes to the dual main listing of Hong Kong stocks: the annual revenue of 9billion is a project supported by Lei Jun

Cloud native - runtime environment

快速读入

Ccf201912-2 recycling station site selection

The usage and Simulation Implementation of vector in STL

Siemens docking Leuze BPS_ 304i notes

03 pyechars 直角坐标系图表(示例代码+效果图)

机器学习基础-支持向量机 SVM-17

AVL tree (balanced search tree)
随机推荐
Which big model is better? Openbmb releases bmlist to give you the answer!
mysql limit 分页优化
C# static的用法详解
Distributed session solution
Unity installs the device simulator
Unity loads GLB model
Chapter IX rest Service Security
牛客网二叉树题解
新零售电商O2O模式解析
Leetcode 42. rainwater connection
Markdown concise grammar manual
Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders
AI制药的数据之困,分子建模能解吗?
Ten prohibitions for men and women in love
MySQL is always installed unsuccessfully. Just do it like this
Introduction to resttemplate
Interface control telerik UI for WPF - how to use radspreadsheet to record or comment
机器学习基础-贝叶斯分析-14
[half understood] zero value copy
Using dependent packages to directly implement paging and SQL statements