当前位置:网站首页>【数据聚类】本专栏中涉及数据集、可视化及注意事项
【数据聚类】本专栏中涉及数据集、可视化及注意事项
2022-06-12 06:38:00 【快乐江湖】
一:聚类常用数据集
(1)常用数据集
Iris数据集

人造数据集之438-3

Jain数据集
melon数据集

Spril数据集

threeCircles数据集

Square数据集

lineblobs数据集

788points数据集

gassian数据集

arrevation数据集

(2)下载
点击关注后私信我(片刻回复)
- 由于链接极易失效且为了方便更新,请见谅
二:数据归一化
为了统一量纲,需要对数据集进行归一,这里使用最常见的“0-1归一化”方式即可
min_vals = train_data.min(0)
max_vals = train_data.max(0)
ranges = max_vals - min_vals
normal_data = np.zeros(np.shape(train_data))
nums = train_data.shape[0]
normal_data = train_data - np.tile(min_vals, (nums, 1))
normal_data = normal_data / np.tile(ranges, (nums, 1))
print(normal_data)
三:可视化处理
聚类效果展示是非常重要的环节,这里给出几个例子,适用于不同聚类类型的聚类算法,读者可以在你的代码上稍作修改
- 建议有Python绘图基础
(1)方法一
- 一般适用于基于划分的聚类算法(如K-Means)
- 算法一般会返回聚类中心和样本点所属簇的编号这两个列表
import pandas as pd
import matplotlib.pyplot as plt
import KMeans2
import numpy as np
Iris_types = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'] # 花类型
Iris_data = pd.read_csv('./Iris.csv')
x_axis = 'PetalLengthCm' # 花瓣长度
y_axis = 'PetalWidthCm' # 花瓣宽度
# x_axis = 'SepalLengthCm' # 花萼长度
# y_axis = 'SepalWidthCm' # 花萼宽度
examples_num = Iris_data.shape[0] # 样本数量
train_data = Iris_data[[x_axis, y_axis]].values.reshape(examples_num, 2) # 整理数据
# 归一化
min_vals = train_data.min(0)
max_vals = train_data.max(0)
ranges = max_vals - min_vals
normal_data = np.zeros(np.shape(train_data))
nums = train_data.shape[0]
normal_data = train_data - np.tile(min_vals, (nums, 1))
normal_data = normal_data / np.tile(ranges, (nums, 1))
# 训练参数
k = 3 # 簇数
max_iterations = 50 # 最大迭代次数
centroids, cluster = KMeans2.k_means(normal_data, k, max_iterations)
plt.figure(figsize=(12, 5), dpi=80)
# 第一幅图是已知标签或全部数据
plt.subplot(1, 2, 1)
for Iris_type in Iris_types:
plt.scatter(Iris_data[x_axis], Iris_data[y_axis], c='black')
plt.title('raw')
# 第二幅图是聚类结果
plt.subplot(1, 2, 2)
for centroid_id, centroid in enumerate(centroids): # 非聚类中心
current_examples_index = (cluster == centroid_id).flatten()
plt.scatter(normal_data[current_examples_index, 0], normal_data[current_examples_index, 1])
for centroid_id, centroid in enumerate(centroids): # 聚类中心
plt.scatter(centroid[0], centroid[1], c='red', marker='x')
plt.title('label kemans')
plt.show()

(2)方法二
暂略
四:一些总结
边栏推荐
- Video based fire smoke detection using robust AdaBoost
- SQL injection - blind injection
- [easyexcel] easyexcel checks whether the header matches the tool class encapsulated in easyexcel, including the field verification function. You can use validate to verify
- 数据库语法相关问题,求解一个正确语法
- 六月集训 第一日——数组
- PHP 开发环境搭建及数据库增删改查
- Redis application (I) -- distributed lock
- leetcode:剑指 Offer 66. 构建乘积数组【前后缀积的应用】
- Multithreading (V) -- Concurrent tools (II) -- j.u.c concurrent contracting (I) -- AQS and reentrantlock principles
- Vscode common plug-ins
猜你喜欢

Vscode common plug-ins

Tomato learning notes-stm32 SPI introduction and Tim synchronization

Tomato learning notes -seq2seq
![Leetcode: Sword finger offer 67 Convert string to integer [simulation + segmentation + discussion]](/img/32/16751c0a783cc3121eddfe265e2f4f.png)
Leetcode: Sword finger offer 67 Convert string to integer [simulation + segmentation + discussion]
![Set [list] to find out the subscript of repeated elements in the list (display the position of the subscript)](/img/95/67f435646f52646fc6cae8c680d589.jpg)
Set [list] to find out the subscript of repeated elements in the list (display the position of the subscript)
![Leetcode: offer 60 Points of N dice [math + level DP + cumulative contribution]](/img/2b/41bd6a213892062f4c12721b5d4e8d.png)
Leetcode: offer 60 Points of N dice [math + level DP + cumulative contribution]

VSCode常用插件

Redis configuration (IV) -- cluster

Vscode Common plug - in

Automatic modeling of Interchange
随机推荐
(14)Blender源码分析之闪屏窗口显示软件版本号
leetcode:890. 查找和替换模式【两个dict记录双射(set)】
The first day of June training - array
[reinstall system] 01 system startup USB flash disk production
上传文件(post表单提交form-data)
Android studio mobile development creates a new database and obtains picture and text data from the database to display on the listview list
Touch screen setting for win7 system dual screen extended display
Leetcode January 10 daily question 306 Additive number
Multithreading mode (I) -- protective pause and join source code
8. form label
Apache poi 导入导出Excel文件
Redis supports data structure types
Leetcode personal question solution (Sword finger offer3-5) 3 Duplicate number in array, 4 Find in 2D array, 5 Replace spaces
Solution: unsatisfieddependencyexception: error creating bean with name 'authaspect':
数据库全量SQL分析与审计系统性能优化之旅
SQL injection based on error reporting
Oracle Database
QT--实现TCP通信
leetcode:剑指 Offer 66. 构建乘积数组【前后缀积的应用】
Computer composition and design work06 - based on MIPS