当前位置:网站首页>机器学习基础-决策树-12
机器学习基础-决策树-12
2022-07-28 11:51:00 【gemoumou】
决策树Decision Tree












决策树-例子
from sklearn.feature_extraction import DictVectorizer
from sklearn import tree
from sklearn import preprocessing
import csv
# 读入数据
Dtree = open(r'AllElectronics.csv', 'r')
reader = csv.reader(Dtree)
# 获取第一行数据
headers = reader.__next__()
print(headers)
# 定义两个列表
featureList = []
labelList = []
#
for row in reader:
# 把label存入list
labelList.append(row[-1])
rowDict = {
}
for i in range(1, len(row)-1):
#建立一个数据字典
rowDict[headers[i]] = row[i]
# 把数据字典存入list
featureList.append(rowDict)
print(featureList)

# 把数据转换成01表示
vec = DictVectorizer()
x_data = vec.fit_transform(featureList).toarray()
print("x_data: " + str(x_data))
# 打印属性名称
print(vec.get_feature_names())
# 打印标签
print("labelList: " + str(labelList))
# 把标签转换成01表示
lb = preprocessing.LabelBinarizer()
y_data = lb.fit_transform(labelList)
print("y_data: " + str(y_data))

# 创建决策树模型
model = tree.DecisionTreeClassifier(criterion='entropy')
# 输入数据建立模型
model.fit(x_data, y_data)

# 测试
x_test = x_data[0]
print("x_test: " + str(x_test))
predict = model.predict(x_test.reshape(1,-1))
print("predict: " + str(predict))












决策树-CART
from sklearn import tree
import numpy as np
# 载入数据
data = np.genfromtxt("cart.csv", delimiter=",")
x_data = data[1:,1:-1]
y_data = data[1:,-1]
# 创建决策树模型
model = tree.DecisionTreeClassifier()
# 输入数据建立模型
model.fit(x_data, y_data)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['house_yes','house_no','single','married','divorced','income'],
class_names = ['no','yes'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)
graph.render('cart')


决策树-线性二分类
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn import tree
# 载入数据
data = np.genfromtxt("LR-testSet.csv", delimiter=",")
x_data = data[:,:-1]
y_data = data[:,-1]
plt.scatter(x_data[:,0],x_data[:,1],c=y_data)
plt.show()

# 创建决策树模型
model = tree.DecisionTreeClassifier()
# 输入数据建立模型
model.fit(x_data, y_data)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

# 获取数据值所在的范围
x_min, x_max = x_data[:, 0].min() - 1, x_data[:, 0].max() + 1
y_min, y_max = x_data[:, 1].min() - 1, x_data[:, 1].max() + 1
# 生成网格矩阵
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
z = model.predict(np.c_[xx.ravel(), yy.ravel()])# ravel与flatten类似,多维数据转一维。flatten不会改变原始数据,ravel会改变原始数据
z = z.reshape(xx.shape)
# 等高线图
cs = plt.contourf(xx, yy, z)
# 样本散点图
plt.scatter(x_data[:, 0], x_data[:, 1], c=y_data)
plt.show()

predictions = model.predict(x_data)
print(classification_report(predictions,y_data))

决策树-非线性二分类
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report
from sklearn import tree
from sklearn.model_selection import train_test_split
# 载入数据
data = np.genfromtxt("LR-testSet2.txt", delimiter=",")
x_data = data[:,:-1]
y_data = data[:,-1]
plt.scatter(x_data[:,0],x_data[:,1],c=y_data)
plt.show()

#分割数据
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data)
# 创建决策树模型
# max_depth,树的深度
# min_samples_split 内部节点再划分所需最小样本数
model = tree.DecisionTreeClassifier(max_depth=7,min_samples_split=4)
# 输入数据建立模型
model.fit(x_train, y_train)

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

# 获取数据值所在的范围
x_min, x_max = x_data[:, 0].min() - 1, x_data[:, 0].max() + 1
y_min, y_max = x_data[:, 1].min() - 1, x_data[:, 1].max() + 1
# 生成网格矩阵
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02),
np.arange(y_min, y_max, 0.02))
z = model.predict(np.c_[xx.ravel(), yy.ravel()])# ravel与flatten类似,多维数据转一维。flatten不会改变原始数据,ravel会改变原始数据
z = z.reshape(xx.shape)
# 等高线图
cs = plt.contourf(xx, yy, z)
# 样本散点图
plt.scatter(x_data[:, 0], x_data[:, 1], c=y_data)
plt.show()

predictions = model.predict(x_train)
print(classification_report(predictions,y_train))

predictions = model.predict(x_test)
print(classification_report(predictions,y_test))


回归树
import numpy as np
import matplotlib.pyplot as plt
from sklearn import tree
# 载入数据
data = np.genfromtxt("data.csv", delimiter=",")
x_data = data[:,0,np.newaxis]
y_data = data[:,1,np.newaxis]
plt.scatter(x_data,y_data)
plt.show()

model = tree.DecisionTreeRegressor(max_depth=5)
model.fit(x_data, y_data)

x_test = np.linspace(20,80,100)
x_test = x_test[:,np.newaxis]
# 画图
plt.plot(x_data, y_data, 'b.')
plt.plot(x_test, model.predict(x_test), 'r')
plt.show()

# 导出决策树
import graphviz # http://www.graphviz.org/
dot_data = tree.export_graphviz(model,
out_file = None,
feature_names = ['x','y'],
class_names = ['label0','label1'],
filled = True,
rounded = True,
special_characters = True)
graph = graphviz.Source(dot_data)

回归树-预测房价
from sklearn import tree
from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.model_selection import train_test_split
housing = fetch_california_housing()
print(housing.DESCR)

housing.data.shape

housing.data[0]

housing.target[0]

x_data = housing.data
y_data = housing.target
x_train,x_test,y_train,y_test = train_test_split(x_data, y_data)
model = tree.DecisionTreeRegressor()
model.fit(x_train, y_train)


边栏推荐
- 1331. Array sequence number conversion: simple simulation question
- 机器学习基础-支持向量机 SVM-17
- Brief introduction to JS operator
- Science 重磅:AI设计蛋白质再获突破,可设计特定功能性蛋白质
- 区块反转(暑假每日一题 7)
- Zurich Federal Institute of technology | reference based image super resolution with deformable attention transformer (eccv2022))
- Force buckle 315 calculates the number of elements smaller than the current element on the right
- leetcode:704二分查找
- 01 pyechars 特性、版本、安装介绍
- Problem solving during copilot trial
猜你喜欢

Hongjiu fruit passed the hearing: five month operating profit of 900million Ali and China agricultural reclamation are shareholders

AVL tree (balanced search tree)

Machine learning practice - neural network-21

一台电脑上 多个项目公用一个 公私钥对拉取gerrit服务器代码

单调栈Monotonic Stack

What if the right button of win11 start menu doesn't respond

Merge sort

How to open the power saving mode of win11 system computer

Connected Block & food chain - (summary of parallel search set)

GMT installation and use
随机推荐
MySQL is always installed unsuccessfully. Just do it like this
STM32F103 几个特殊引脚做普通io使用注意事项以及备份寄存器丢失数据问题1,2
Quick read in
SuperMap game engine license module division
03 pyechars rectangular coordinate system chart (example code + effect drawing)
Vs code is not in its original position after being updated
Which big model is better? Openbmb releases bmlist to give you the answer!
04 pyechars 地理图表(示例代码+效果图)
Installation and reinstallation of win11 system graphic version tutorial
Deployment之滚动更新策略。
Uncover why devaxpress WinForms, an interface control, discards the popular maskbox property
利用依赖包直接实现分页、SQL语句
Leetcode: array
Hc-05 Bluetooth module debugging slave mode and master mode experience
Leetcode206 reverse linked list
Fundamentals of machine learning Bayesian analysis-14
Science heavyweight: AI design protein has made another breakthrough, and it can design specific functional proteins
Leetcode:704 binary search
Merge table rows - three levels of for loop traversal data
西门子对接Leuze BPS_304i 笔记