当前位置:网站首页>[data mining] task 5: k-means/dbscan clustering: double square
[data mining] task 5: k-means/dbscan clustering: double square
2022-07-03 01:34:00 【zstar-_】
requirement
Program the following data clustering : Double square
Import library and global settings
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import KMeans, DBSCAN
plt.rcParams['font.sans-serif'] = ["SimHei"]
plt.rcParams["axes.unicode_minus"] = False
Generate double-layer square data
a = np.arange(1, 10, 0.01)
b = np.arange(3, 8, 0.01)
w = np.zeros((5600, 3))
# Outer square dot
w[:900, 0] = a
w[:900, 1] = 1
w[900:1800, 0] = 1
w[900:1800, 1] = a
w[1800:2700, 0] = a
w[1800:2700, 1] = 10
w[2700:3600, 0] = 10
w[2700:3600, 1] = a
# Inner square dot
w[3600:4100, 0] = b
w[3600:4100, 1] = 3
w[4100:4600, 0] = 3
w[4100:4600, 1] = b
w[4600:5100, 0] = b
w[4600:5100, 1] = 8
w[5100:, 0] = 8
w[5100:, 1] = b
w[3600:, 2] = 1
K-Means clustering
Parameter description
n_clusters: Number of clusters
random_state: Randomness of control parameters
cluster = KMeans(n_clusters=2, random_state=0)
y = cluster.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title(" Raw data ")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title(" After clustering data ")

DBSCAN clustering
Parameter description
eps:ϵ- Distance threshold of neighborhood , The distance from the sample exceeds ϵ The sample point of is not in ϵ- In the neighborhood , The default value is 0.5.
min_samples: The minimum number of points to form a high-density area . As the core point, the neighborhood ( That is, take it as the center of the circle ,eps Is a circle of radius , Including points on the circle ) Minimum number of samples in ( Including the point itself ).
if y=-1, Is the outlier .
because DBSCAN The generated category is uncertain , Therefore, define a function to filter out the most appropriate parameters that meet the specified category .
The appropriate criterion is to minimize the number of outliers .
# Filter parameters
def search_best_parameter(N_clusters, X):
min_outliners = 999
best_eps = 0
best_min_samples = 0
# Iterating different eps value
for eps in np.arange(0.001, 1, 0.05):
# Iterating different min_samples value
for min_samples in range(2, 10):
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
# Model fitting
y = dbscan.fit_predict(X)
# Count the number of clusters under each parameter combination (-1 Indicates an outlier )
if len(np.argwhere(y == -1)) == 0:
n_clusters = len(np.unique(y))
else:
n_clusters = len(np.unique(y)) - 1
# Number of outliers
outliners = len([i for i in y if i == -1])
if outliners < min_outliners and n_clusters == N_clusters:
min_outliners = outliners
best_eps = eps
best_min_samples = min_samples
return best_eps, best_min_samples
eps, min_samples = search_best_parameter(2, w)
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
y = dbscan.fit_predict(w)
colors = ['black', 'red']
plt.figure(figsize=(15, 15))
plt.subplot(2, 2, 1)
for i in range(len(w)):
plt.scatter(w[i][0], w[i][1], color=colors[int(w[i][2])])
plt.title(" Raw data ")
plt.subplot(2, 2, 2)
for i in range(len(y)):
plt.scatter(w[i][0], w[i][1], color=colors[y[i]])
plt.title(" After clustering data ")

summary
For double-layer square data ,K-Means Clustering method is not suitable for clustering , And use DBSCAN This method can achieve better results .
边栏推荐
- Basic concept and implementation of overcoming hash
- C application interface development foundation - form control (3) - file control
- Common English Vocabulary
- How is the mask effect achieved in the LPL ban/pick selection stage?
- 看完这篇 教你玩转渗透测试靶机Vulnhub——DriftingBlues-9
- Main features of transport layer TCP and TCP connection
- 不登陆或者登录解决oracle数据库账号被锁定。
- LDC Build Shared Library
- 什么是调。调的故事
- Concise analysis of redis source code 11 - Main IO threads and redis 6.0 multi IO threads
猜你喜欢
![[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道](/img/c6/9aee30cb935b203c7c62b12c822085.jpg)
[机缘参悟-36]:鬼谷子-飞箝篇 - 面对捧杀与诱饵的防范之道
![[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core](/img/fb/c371ffaa9614c6f2fd581ba89eb2ab.png)
[FPGA tutorial case 6] design and implementation of dual port RAM based on vivado core

After reading this article, I will teach you to play with the penetration test target vulnhub - drivetingblues-9

MySQL

C application interface development foundation - form control (3) - file control
![[androd] module dependency replacement of gradle's usage skills](/img/5f/968db696932f155a8c4a45f67135ac.png)
[androd] module dependency replacement of gradle's usage skills

MySQL - database query - basic query

QTableWidget懒加载剩内存,不卡!

Daily topic: movement of haystack

Expérience de recherche d'emploi d'un programmeur difficile
随机推荐
Tp6 fast installation uses mongodb to add, delete, modify and check
How is the mask effect achieved in the LPL ban/pick selection stage?
High resolution network (Part 1): Principle Analysis
Using tensorboard to visualize the model, data and training process
Button wizard play strange learning - automatic return to the city route judgment
MySQL foundation 04 MySQL architecture
按键精灵打怪学习-自动回城路线的判断
2022 coal mine gas drainage examination question bank and coal mine gas drainage examination questions and analysis
Leetcode 6103 - minimum fraction to delete an edge from the tree
What operations need attention in the spot gold investment market?
【QT】自定义控件的封装
Uniapp component -uni notice bar notice bar
【系统分析师之路】第五章 复盘软件工程(开发模型开发方法)
Leetcode 2097 - Legal rearrangement of pairs
数学知识:能被整除的数—容斥原理
【我的OpenGL学习进阶之旅】关于欧拉角、旋转顺序、旋转矩阵、四元数等知识的整理
Force buckle 204 Count prime
Swiftui component Encyclopedia: using scenekit and swiftui to build interactive 3D pie charts (tutorial with source code)
看疫情之下服装企业如何顺势而为
[FPGA tutorial case 5] ROM design and Implementation Based on vivado core