当前位置:网站首页>【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
2022-07-03 01:09:00 【zstar-_】
要求
编程如下数据聚类:双层正方形
导库与全局设置
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, DBSCAN
plt.rcParams['font.sans-serif'] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
生成双层正方形数据
a = np.arange(1, 10, 0.01)
b = np.arange(3, 8, 0.01)
w = np.zeros((5600, 3))
# 外层正方形点
w[:900, 0] = a
w[:900, 1] = 1
w[900:1800, 0] = 1
w[900:1800, 1] = a
w[1800:2700, 0] = a
w[1800:2700, 1] = 10
w[2700:3600, 0] = 10
w[2700:3600, 1] = a
# 内层正方形点
w[3600:4100, 0] = b
w[3600:4100, 1] = 3
w[4100:4600, 0] = 3
w[4100:4600, 1] = b
w[4600:5100, 0] = b
w[4600:5100, 1] = 8
w[5100:, 0] = 8
w[5100:, 1] = b
w[3600:, 2] = 1
K-Means 聚类
参数说明
n_clusters:聚类个数
random_state:控制参数随机性
cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

DBSCAN 聚类
参数说明
eps:ϵ-邻域的距离阈值,和样本距离超过ϵ的样本点不在ϵ-邻域内,默认值是0.5。
min_samples:形成高密度区域的最小点数。作为核心点的话邻域(即以其为圆心,eps为半径的圆,含圆上的点)中的最小样本数(包括点本身)。
若y=-1,则为异常点。
由于DBSCAN生成的类别不确定,因此定义一个函数用来筛选出符合指定类别的最合适的参数。
合适的标准是异常点个数最少。
# 筛选参数
def search_best_parameter(N_clusters, X):
min_outliners = 999
best_eps = 0
best_min_samples = 0
# 迭代不同的eps值
for eps in np.arange(0.001, 1, 0.05):
# 迭代不同的min_samples值
for min_samples in range(2, 10):
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# 模型拟合
y = dbscan.fit_predict(X)
# 统计各参数组合下的聚类个数(-1表示异常点)
if len(np.argwhere(y == -1)) == 0:
n_clusters = len(np.unique(y))
else:
n_clusters = len(np.unique(y)) - 1
# 异常点的个数
outliners = len([i for i in y if i == -1])
if outliners < min_outliners and n_clusters == N_clusters:
min_outliners = outliners
best_eps = eps
best_min_samples = min_samples
return best_eps, best_min_samples
eps, min_samples = search_best_parameter(2, w)
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
y = dbscan.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

总结
对于双层正方形数据来说,K-Means聚类方法不适合进行聚类,而采用DBSCAN方法可以取得较好的效果。
边栏推荐
- [技术发展-23]:DSP在未来融合网络中的应用
- [FPGA tutorial case 5] ROM design and Implementation Based on vivado core
- Telephone network problems
- Key wizard hit strange learning - automatic path finding back to hit strange points
- 产业互联网的产业范畴足够大 消费互联网时代仅是一个局限在互联网行业的存在
- Type expansion of non ts/js file modules
- 英语常用词汇
- 给你一个可能存在 重复 元素值的数组 numbers ,它原来是一个升序排列的数组,并按上述情形进行了一次旋转。请返回旋转数组的最小元素。【剑指Offer】
- 【我的OpenGL学习进阶之旅】关于欧拉角、旋转顺序、旋转矩阵、四元数等知识的整理
- Button wizard play strange learning - go back to the city to buy medicine and add blood
猜你喜欢

Force buckle 204 Count prime

leetcode 6103 — 从树中删除边的最小分数
![[androd] module dependency replacement of gradle's usage skills](/img/5f/968db696932f155a8c4a45f67135ac.png)
[androd] module dependency replacement of gradle's usage skills

JDBC courses

Excel removes the data after the decimal point and rounds the number

MySQL基础用法02

【面试题】1369- 什么时候不能使用箭头函数?

After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9
![[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core](/img/fb/c371ffaa9614c6f2fd581ba89eb2ab.png)
[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core

【C语言】指针与数组笔试题详解
随机推荐
C application interface development foundation - form control (3) - file control
MySQL --- 数据库查询 - 条件查询
MySQL - database query - basic query
Mathematical knowledge: divisible number inclusion exclusion principle
What are the trading forms of spot gold and what are the profitable advantages?
Druid database connection pool
看完这篇 教你玩转渗透测试靶机Vulnhub——DriftingBlues-9
Excel removes the data after the decimal point and rounds the number
MySQL --- 数据库查询 - 基本查询
Button wizard play strange learning - automatic return to the city route judgment
Tp6 fast installation uses mongodb to add, delete, modify and check
JDBC courses
How is the mask effect achieved in the LPL ban/pick selection stage?
按键精灵打怪学习-多线程后台坐标识别
串口抓包/截断工具的安装及使用详解
Niu Ke swipes questions and clocks in
Database SQL language 01 where condition
Daily topic: movement of haystack
Androd gradle's substitution of its use module dependency
MySQL basics 03 introduction to MySQL types