当前位置:网站首页>机器学习基础-决策树-12
机器学习基础-决策树-12
2022-07-28 11:51:00 【gemoumou】
决策树Decision Tree












决策树-例子
from sklearn.feature_extraction import DictVectorizer
from sklearn import tree
from sklearn import preprocessing
import csv
# 读入数据
Dtree = open(r'AllElectronics.csv', 'r')
reader = csv.reader(Dtree)
# 获取第一行数据
headers = reader.__next__()
print(headers)
# 定义两个列表
featureList = []
labelList = []
#
for row in reader:
# 把label存入list
labelList.append(row[-1])
rowDict = {
}
for i in range(1, len(row)-1):
#建立一个数据字典
rowDict[headers[i]] = row[i]
# 把数据字典存入list
featureList.append(rowDict)
print(featureList)

# 把数据转换成01表示
vec = DictVectorizer()
x_data = vec.fit_transform(featureList).toarray()
print("x_data: " + str(x_data))
# 打印属性名称
print(vec.get_feature_names())
# 打印标签
print("labelList: " + str(labelList))
# 把标签转换成01表示
lb = preprocessing.LabelBinarizer()
y_data = lb.fit_transform(labelList)
print("y_data: " + str(y_data))

# 创建决策树模型
model = tree.DecisionTreeClassifier(criterion='entropy')
# 输入数据建立模型
model.fit(x_data, y_data)

# 测试
x_test = x_data[0]
print("x_test: " + str(x_test))
predict = model.predict(x_test.reshape(1,-1))
print("predict: " + str(predict))












决策树-CART
from sklearn import tree
import numpy as np
# 载入数据
data = np.genfromtxt("cart.csv", delimiter=",")
x_data = data[1:,1:-1]
y_data = data[1:,-1]
# 创建决策树模型
model = tree.DecisionTreeClassifier()
# 输入数据建立模型
model.fit(x_data, y_data)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['house_yes','house_no','single','married','divorced','income'],
class_names = ['no','yes'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)
graph.render('cart')


决策树-线性二分类
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn import tree
# 载入数据
data = np.genfromtxt("LR-testSet.csv", delimiter=",")
x_data = data[:,:-1]
y_data = data[:,-1]
plt.scatter(x_data[:,0],x_data[:,1],c=y_data)
plt.show()

# 创建决策树模型
model = tree.DecisionTreeClassifier()
# 输入数据建立模型
model.fit(x_data, y_data)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

# 获取数据值所在的范围
x_min, x_max = x_data[:, 0].min() - 1, x_data[:, 0].max() + 1
y_min, y_max = x_data[:, 1].min() - 1, x_data[:, 1].max() + 1
# 生成网格矩阵
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
z = model.predict(np.c_[xx.ravel(), yy.ravel()])# ravel与flatten类似,多维数据转一维。flatten不会改变原始数据,ravel会改变原始数据
z = z.reshape(xx.shape)
# 等高线图
cs = plt.contourf(xx, yy, z)
# 样本散点图
plt.scatter(x_data[:, 0], x_data[:, 1], c=y_data)
plt.show()

predictions = model.predict(x_data)
print(classification_report(predictions,y_data))

决策树-非线性二分类
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn import tree
from sklearn.model_selection import train_test_split
# 载入数据
data = np.genfromtxt("LR-testSet2.txt", delimiter=",")
x_data = data[:,:-1]
y_data = data[:,-1]
plt.scatter(x_data[:,0],x_data[:,1],c=y_data)
plt.show()

#分割数据
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data)
# 创建决策树模型
# max_depth,树的深度
# min_samples_split 内部节点再划分所需最小样本数
model = tree.DecisionTreeClassifier(max_depth=7,min_samples_split=4)
# 输入数据建立模型
model.fit(x_train, y_train)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

# 获取数据值所在的范围
x_min, x_max = x_data[:, 0].min() - 1, x_data[:, 0].max() + 1
y_min, y_max = x_data[:, 1].min() - 1, x_data[:, 1].max() + 1
# 生成网格矩阵
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
z = model.predict(np.c_[xx.ravel(), yy.ravel()])# ravel与flatten类似,多维数据转一维。flatten不会改变原始数据,ravel会改变原始数据
z = z.reshape(xx.shape)
# 等高线图
cs = plt.contourf(xx, yy, z)
# 样本散点图
plt.scatter(x_data[:, 0], x_data[:, 1], c=y_data)
plt.show()

predictions = model.predict(x_train)
print(classification_report(predictions,y_train))

predictions = model.predict(x_test)
print(classification_report(predictions,y_test))


回归树
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
# 载入数据
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0,np.newaxis]
y_data = data[:,1,np.newaxis]
plt.scatter(x_data,y_data)
plt.show()

model = tree.DecisionTreeRegressor(max_depth=5)
model.fit(x_data, y_data)

x_test = np.linspace(20,80,100)
x_test = x_test[:,np.newaxis]
# 画图
plt.plot(x_data, y_data, 'b.')
plt.plot(x_test, model.predict(x_test), 'r')
plt.show()

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

回归树-预测房价
from sklearn import tree
from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.model_selection import train_test_split
housing = fetch_california_housing()
print(housing.DESCR)

housing.data.shape

housing.data[0]

housing.target[0]

x_data = housing.data
y_data = housing.target
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data)
model = tree.DecisionTreeRegressor()
model.fit(x_train, y_train)


边栏推荐
- Review the IO stream again, and have an in-depth understanding of serialization and deserialization
- [nuxt 3] (XII) project directory structure 3
- Can molecular modeling solve the data dilemma of AI pharmacy?
- Merge sort
- AVL tree (balanced search tree)
- Leetcode 42. rainwater connection
- Introduction to border border attribute
- The input string contains an array of numbers and non characters, such as a123x456. Take the consecutive numbers as an integer, store them in an array in turn, such as 123 in a[0], 456 in a[1], and ou
- Design a thread pool
- Using dependent packages to directly implement paging and SQL statements
猜你喜欢

机器学习实战-神经网络-21

MMA8452Q几种模式的初始化实例

05 pyechars 基本图表(示例代码+效果图)

Machine learning practice - integrated learning-23

机器学习实战-集成学习-23

DART 三维辐射传输模型申请及下载

Merge sort

Markdown concise grammar manual

Summary: golang's ide:vscode usage

Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders
随机推荐
Machine learning practice - decision tree-22
Installation and reinstallation of win11 system graphic version tutorial
Chapter IX rest Service Security
Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
Sliding Window
20220728-Object类常用方法
JSP自定义标签之自定义分页标签02
MSP430 开发中遇到的坑(待续)
区块反转(暑假每日一题 7)
连通块&&食物链——(并查集小结)
Can molecular modeling solve the data dilemma of AI pharmacy?
Ten prohibitions for men and women in love
LeetCode 移除元素&移动零
Communication example between upper computer and Mitsubishi fn2x
Minimally invasive electrophysiology has passed the registration: a listed enterprise with annual revenue of 190million minimally invasive mass production
Using JSON in C language projects
[half understood] zero value copy
Vs code is not in its original position after being updated
Rolling update strategy of deployment.
C structure use