当前位置:网站首页>[data mining] task 5: k-means/dbscan clustering: double square
[data mining] task 5: k-means/dbscan clustering: double square
2022-07-03 01:34:00 【zstar-_】
requirement
Program the following data clustering : Double square
Import library and global settings
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, DBSCAN
plt.rcParams['font.sans-serif'] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
Generate double-layer square data
a = np.arange(1, 10, 0.01)
b = np.arange(3, 8, 0.01)
w = np.zeros((5600, 3))
# Outer square dot
w[:900, 0] = a
w[:900, 1] = 1
w[900:1800, 0] = 1
w[900:1800, 1] = a
w[1800:2700, 0] = a
w[1800:2700, 1] = 10
w[2700:3600, 0] = 10
w[2700:3600, 1] = a
# Inner square dot
w[3600:4100, 0] = b
w[3600:4100, 1] = 3
w[4100:4600, 0] = 3
w[4100:4600, 1] = b
w[4600:5100, 0] = b
w[4600:5100, 1] = 8
w[5100:, 0] = 8
w[5100:, 1] = b
w[3600:, 2] = 1
K-Means clustering
Parameter description
n_clusters: Number of clusters
random_state: Randomness of control parameters
cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title(" Raw data ")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title(" After clustering data ")

DBSCAN clustering
Parameter description
eps:ϵ- Distance threshold of neighborhood , The distance from the sample exceeds ϵ The sample point of is not in ϵ- In the neighborhood , The default value is 0.5.
min_samples: The minimum number of points to form a high-density area . As the core point, the neighborhood ( That is, take it as the center of the circle ,eps Is a circle of radius , Including points on the circle ) Minimum number of samples in ( Including the point itself ).
if y=-1, Is the outlier .
because DBSCAN The generated category is uncertain , Therefore, define a function to filter out the most appropriate parameters that meet the specified category .
The appropriate criterion is to minimize the number of outliers .
# Filter parameters
def search_best_parameter(N_clusters, X):
min_outliners = 999
best_eps = 0
best_min_samples = 0
# Iterating different eps value
for eps in np.arange(0.001, 1, 0.05):
# Iterating different min_samples value
for min_samples in range(2, 10):
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# Model fitting
y = dbscan.fit_predict(X)
# Count the number of clusters under each parameter combination (-1 Indicates an outlier )
if len(np.argwhere(y == -1)) == 0:
n_clusters = len(np.unique(y))
else:
n_clusters = len(np.unique(y)) - 1
# Number of outliers
outliners = len([i for i in y if i == -1])
if outliners < min_outliners and n_clusters == N_clusters:
min_outliners = outliners
best_eps = eps
best_min_samples = min_samples
return best_eps, best_min_samples
eps, min_samples = search_best_parameter(2, w)
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
y = dbscan.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title(" Raw data ")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title(" After clustering data ")

summary
For double-layer square data ,K-Means Clustering method is not suitable for clustering , And use DBSCAN This method can achieve better results .
边栏推荐
- Type expansion of non ts/js file modules
- 力扣 204. 计数质数
- Wireshark data analysis and forensics a.pacapng
- [untitled]
- Soft exam information system project manager_ Real topic over the years_ Wrong question set in the second half of 2019_ Morning comprehensive knowledge question - Senior Information System Project Man
- 看疫情之下服装企业如何顺势而为
- Database SQL language 02 connection query
- Mathematical knowledge: Nim game game theory
- How is the mask effect achieved in the LPL ban/pick selection stage?
- 什么是调。调的故事
猜你喜欢

How is the mask effect achieved in the LPL ban/pick selection stage?
![[QT] encapsulation of custom controls](/img/33/aa2ef625d1e51e945571c116a1f1a9.png)
[QT] encapsulation of custom controls

Give you an array numbers that may have duplicate element values. It was originally an array arranged in ascending order, and it was rotated once according to the above situation. Please return the sm
![[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道](/img/c6/9aee30cb935b203c7c62b12c822085.jpg)
[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道

Top ten regular spot trading platforms 2022

C application interface development foundation - form control (2) - MDI form

软考信息系统项目管理师_历年真题_2019下半年错题集_上午综合知识题---软考高级之信息系统项目管理师053

Niu Ke swipes questions and clocks in

Machine learning terminology
![[principles of multithreading and high concurrency: 2. Solutions to cache consistency]](/img/ce/5c41550ed649ee7cada17b0160f739.jpg)
[principles of multithreading and high concurrency: 2. Solutions to cache consistency]
随机推荐
MySQL basics 03 introduction to MySQL types
Scheme and practice of cold and hot separation of massive data
MySQL - database query - basic query
Look at how clothing enterprises take advantage of the epidemic
JDBC courses
一比特苦逼程序員的找工作經曆
MySQL --- 数据库查询 - 基本查询
GDB 在嵌入式中的相关概念
Dotconnect for PostgreSQL data provider
[QT] encapsulation of custom controls
看疫情之下服装企业如何顺势而为
[fh-gfsk] fh-gfsk signal analysis and blind demodulation research
[C language] detailed explanation of pointer and array written test questions
Thinkphp+redis realizes simple lottery
按键精灵打怪学习-自动回城路线的判断
Qtablewidget lazy load remaining memory, no card!
QTableWidget懒加载剩内存,不卡!
Using tensorboard to visualize the model, data and training process
Kivy教程大全之 创建您的第一个kivy程序 hello word(教程含源码)
2022 Jiangxi Provincial Safety Officer B certificate reexamination examination and Jiangxi Provincial Safety Officer B certificate simulation examination question bank