当前位置:网站首页>【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
【数据挖掘】任务5:K-means/DBSCAN聚类:双层正方形
2022-07-03 01:09:00 【zstar-_】
要求
编程如下数据聚类:双层正方形
导库与全局设置
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, DBSCAN
plt.rcParams['font.sans-serif'] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
生成双层正方形数据
a = np.arange(1, 10, 0.01)
b = np.arange(3, 8, 0.01)
w = np.zeros((5600, 3))
# 外层正方形点
w[:900, 0] = a
w[:900, 1] = 1
w[900:1800, 0] = 1
w[900:1800, 1] = a
w[1800:2700, 0] = a
w[1800:2700, 1] = 10
w[2700:3600, 0] = 10
w[2700:3600, 1] = a
# 内层正方形点
w[3600:4100, 0] = b
w[3600:4100, 1] = 3
w[4100:4600, 0] = 3
w[4100:4600, 1] = b
w[4600:5100, 0] = b
w[4600:5100, 1] = 8
w[5100:, 0] = 8
w[5100:, 1] = b
w[3600:, 2] = 1
K-Means 聚类
参数说明
n_clusters:聚类个数
random_state:控制参数随机性
cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

DBSCAN 聚类
参数说明
eps:ϵ-邻域的距离阈值,和样本距离超过ϵ的样本点不在ϵ-邻域内,默认值是0.5。
min_samples:形成高密度区域的最小点数。作为核心点的话邻域(即以其为圆心,eps为半径的圆,含圆上的点)中的最小样本数(包括点本身)。
若y=-1,则为异常点。
由于DBSCAN生成的类别不确定,因此定义一个函数用来筛选出符合指定类别的最合适的参数。
合适的标准是异常点个数最少。
# 筛选参数
def search_best_parameter(N_clusters, X):
min_outliners = 999
best_eps = 0
best_min_samples = 0
# 迭代不同的eps值
for eps in np.arange(0.001, 1, 0.05):
# 迭代不同的min_samples值
for min_samples in range(2, 10):
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# 模型拟合
y = dbscan.fit_predict(X)
# 统计各参数组合下的聚类个数(-1表示异常点)
if len(np.argwhere(y == -1)) == 0:
n_clusters = len(np.unique(y))
else:
n_clusters = len(np.unique(y)) - 1
# 异常点的个数
outliners = len([i for i in y if i == -1])
if outliners < min_outliners and n_clusters == N_clusters:
min_outliners = outliners
best_eps = eps
best_min_samples = min_samples
return best_eps, best_min_samples
eps, min_samples = search_best_parameter(2, w)
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
y = dbscan.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title("原始数据")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title("聚类后数据")

总结
对于双层正方形数据来说,K-Means聚类方法不适合进行聚类,而采用DBSCAN方法可以取得较好的效果。
边栏推荐
- What are the trading forms of spot gold and what are the profitable advantages?
- Uniapp component -uni notice bar notice bar
- 一位苦逼程序员的找工作经历
- Dotconnect for PostgreSQL data provider
- Work experience of a hard pressed programmer
- Button wizard play strange learning - automatic return to the city route judgment
- LeetCode 987. Vertical order transverse of a binary tree - Binary Tree Series Question 7
- 不登陆或者登录解决oracle数据库账号被锁定。
- 按键精灵打怪学习-多线程后台坐标识别
- Concise analysis of redis source code 11 - Main IO threads and redis 6.0 multi IO threads
猜你喜欢

Detailed explanation of Q-learning examples of reinforcement learning

强化学习 Q-learning 实例详解

MySQL基础用法02

Basis of information entropy
![[flutter] icons component (fluttericon Download Icon | customize SVG icon to generate TTF font file | use the downloaded TTF icon file)](/img/ca/1d2473ae51c59b84864352eb17de94.jpg)
[flutter] icons component (fluttericon Download Icon | customize SVG icon to generate TTF font file | use the downloaded TTF icon file)

Cut point of undirected graph
![[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core](/img/fb/c371ffaa9614c6f2fd581ba89eb2ab.png)
[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core

dotConnect for PostgreSQL数据提供程序

Find a benchmark comrade in arms | a million level real-time data platform, which can be used for free for life

Database SQL language 01 where condition
随机推荐
LeetCode 987. Vertical order transverse of a binary tree - Binary Tree Series Question 7
Canvas drawing -- bingdd
The industrial scope of industrial Internet is large enough. The era of consumer Internet is only a limited existence in the Internet industry
按键精灵打怪学习-多线程后台坐标识别
C application interface development foundation - form control (2) - MDI form
按鍵精靈打怪學習-多線程後臺坐標識別
[untitled]
Meibeer company is called "Manhattan Project", and its product name is related to the atomic bomb, which has caused dissatisfaction among Japanese netizens
英语常用词汇
[Androd] Gradle 使用技巧之模块依赖替换
MySQL基础用法02
Soft exam information system project manager_ Real topic over the years_ Wrong question set in the second half of 2019_ Morning comprehensive knowledge question - Senior Information System Project Man
After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9
Arduino dy-sv17f automatic voice broadcast
[androd] module dependency replacement of gradle's usage skills
数学知识:Nim游戏—博弈论
Cut point of undirected graph
Draw love with go+ to express love to her beloved
Mathematical Knowledge: Steps - Nim Games - Game Theory
Thinkphp+redis realizes simple lottery