当前位置:网站首页>【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
2022-07-03 01:09:00 【zstar-_】
要求
编程如下数据聚类:双层正方形
导库与全局设置
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, DBSCAN
plt.rcParams['font.sans-serif'] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
生成双层正方形数据
a = np.arange(1, 10, 0.01)
b = np.arange(3, 8, 0.01)
w = np.zeros((5600, 3))
# 外层正方形点
w[:900, 0] = a
w[:900, 1] = 1
w[900:1800, 0] = 1
w[900:1800, 1] = a
w[1800:2700, 0] = a
w[1800:2700, 1] = 10
w[2700:3600, 0] = 10
w[2700:3600, 1] = a
# 内层正方形点
w[3600:4100, 0] = b
w[3600:4100, 1] = 3
w[4100:4600, 0] = 3
w[4100:4600, 1] = b
w[4600:5100, 0] = b
w[4600:5100, 1] = 8
w[5100:, 0] = 8
w[5100:, 1] = b
w[3600:, 2] = 1
K-Means 聚类
参数说明
n_clusters:聚类个数
random_state:控制参数随机性
cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

DBSCAN 聚类
参数说明
eps:ϵ-邻域的距离阈值,和样本距离超过ϵ的样本点不在ϵ-邻域内,默认值是0.5。
min_samples:形成高密度区域的最小点数。作为核心点的话邻域(即以其为圆心,eps为半径的圆,含圆上的点)中的最小样本数(包括点本身)。
若y=-1,则为异常点。
由于DBSCAN生成的类别不确定,因此定义一个函数用来筛选出符合指定类别的最合适的参数。
合适的标准是异常点个数最少。
# 筛选参数
def search_best_parameter(N_clusters, X):
min_outliners = 999
best_eps = 0
best_min_samples = 0
# 迭代不同的eps值
for eps in np.arange(0.001, 1, 0.05):
# 迭代不同的min_samples值
for min_samples in range(2, 10):
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# 模型拟合
y = dbscan.fit_predict(X)
# 统计各参数组合下的聚类个数(-1表示异常点)
if len(np.argwhere(y == -1)) == 0:
n_clusters = len(np.unique(y))
else:
n_clusters = len(np.unique(y)) - 1
# 异常点的个数
outliners = len([i for i in y if i == -1])
if outliners < min_outliners and n_clusters == N_clusters:
min_outliners = outliners
best_eps = eps
best_min_samples = min_samples
return best_eps, best_min_samples
eps, min_samples = search_best_parameter(2, w)
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
y = dbscan.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

总结
对于双层正方形数据来说,K-Means聚类方法不适合进行聚类,而采用DBSCAN方法可以取得较好的效果。
边栏推荐
- Using tensorboard to visualize the model, data and training process
- [技术发展-23]:DSP在未来融合网络中的应用
- 【我的OpenGL学习进阶之旅】关于欧拉角、旋转顺序、旋转矩阵、四元数等知识的整理
- [shutter] animation animation (shutter animation type | the core class of shutter animation)
- [shutter] animation animation (the core class of shutter animation | animation | curvedanimation | animationcontroller | tween)
- Button wizard play strange learning - go back to the city to buy medicine and add blood
- 【C语言】指针与数组笔试题详解
- 传输层 TCP主要特点和TCP连接
- Matlab finds the position of a row or column in the matrix
- Mongodb common commands of mongodb series
猜你喜欢

Expérience de recherche d'emploi d'un programmeur difficile

Draw love with go+ to express love to her beloved

MySQL basics 03 introduction to MySQL types

Basic concept and implementation of overcoming hash

MySQL

Daily topic: movement of haystack

MySQL基础用法02

MySQL --- 数据库查询 - 基本查询

Excel if formula determines whether the two columns are the same
![[Androd] Gradle 使用技巧之模块依赖替换](/img/5f/968db696932f155a8c4a45f67135ac.png)
[Androd] Gradle 使用技巧之模块依赖替换
随机推荐
Basic concept and implementation of overcoming hash
Kivy教程大全之 创建您的第一个kivy程序 hello word(教程含源码)
Database SQL language 01 where condition
数学知识:Nim游戏—博弈论
Button wizard play strange learning - go back to the city to buy medicine and add blood
按键精灵打怪学习-回城买药加血
英语常用词汇
力扣 204. 计数质数
C application interface development foundation - form control (4) - selection control
数学知识:台阶-Nim游戏—博弈论
[Arduino experiment 17 L298N motor drive module]
【FPGA教程案例6】基于vivado核的双口RAM设计与实现
Androd gradle's substitution of its use module dependency
[my advanced journey of OpenGL learning] collation of Euler angle, rotation order, rotation matrix, quaternion and other knowledge
The industrial scope of industrial Internet is large enough. The era of consumer Internet is only a limited existence in the Internet industry
[FPGA tutorial case 5] ROM design and Implementation Based on vivado core
【我的OpenGL学习进阶之旅】关于欧拉角、旋转顺序、旋转矩阵、四元数等知识的整理
The latest analysis of tool fitter (technician) in 2022 and the test questions and analysis of tool fitter (technician)
Why is it not recommended to use BeanUtils in production?
After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9