当前位置:网站首页>多元聚类分析
多元聚类分析
2022-07-06 08:49:00 【亦是远方】
一、代码
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# 读取文件
datafile = u'student-mat.xlsx' # 文件所在位置,u为防止路径中有中文名称,此处没有,可以省略
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile是excel文件,所以用read_excel,如果是csv文件则用read_csv
d = DataFrame(data)
# 聚类
n = 5 # 聚成 5 类数据
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred表示聚类的结果
# 聚成 5 类数据,统计每个聚类下的数据量,并且求出他们的中心
r1 = pd.Series(mod.labels_).value_counts() # 每个类下面有多少个样本
r2 = pd.DataFrame(mod.cluster_centers_) # 中心
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u'类别数目']
# 给每一条数据标注上被分为哪一类
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u'聚类类别']
print(r)
r.to_excel(outfile) # 如果需要保存到本地,就写上这一列
# 可视化过程
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u'聚类类别'] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u'聚类类别'] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u'聚类类别'] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u'聚类类别'] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u'聚类类别'] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
二、结果
三、数据集
边栏推荐
- [NVIDIA development board] FAQ (updated from time to time)
- Promise 在uniapp的简单使用
- ROS compilation calls the third-party dynamic library (xxx.so)
- JS inheritance method
- Image, CV2 read the conversion and size resize change of numpy array of pictures
- TCP/IP协议
- Problems in loading and saving pytorch trained models
- vb.net 随窗口改变,缩放控件大小以及保持相对位置
- POI add write excel file
- pytorch查看张量占用内存大小
猜你喜欢
SAP ui5 date type sap ui. model. type. Analysis of the parsing format of date
Target detection - pytorch uses mobilenet series (V1, V2, V3) to build yolov4 target detection platform
Crash problem of Chrome browser
sublime text没关闭其他运行就使用CTRL+b运行另外的程序问题
Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
Cesium draw points, lines, and faces
Mobile phones and computers on the same LAN access each other, IIS settings
vb. Net changes with the window, scales the size of the control and maintains its relative position
Navicat Premium 创建MySql 创建存储过程
Warning in install. packages : package ‘RGtk2’ is not available for this version of R
随机推荐
The problem and possible causes of the robot's instantaneous return to the origin of the world coordinate during rviz simulation
Problems in loading and saving pytorch trained models
Image, CV2 read the conversion and size resize change of numpy array of pictures
随手记01
sublime text的编写程序时的Tab和空格缩进问题
电脑F1-F12用途
R language uses the principal function of psych package to perform principal component analysis on the specified data set. PCA performs data dimensionality reduction (input as correlation matrix), cus
PC easy to use essential software (used)
[embedded] cortex m4f DSP Library
After reading the programmer's story, I can't help covering my chest...
JS pure function
企微服务商平台收费接口对接教程
UML图记忆技巧
View computer devices in LAN
Mobile phones and computers on the same LAN access each other, IIS settings
Tdengine biweekly selection of community issues | phase III
【嵌入式】Cortex M4F DSP库
Indentation of tabs and spaces when writing programs for sublime text
Fairguard game reinforcement: under the upsurge of game going to sea, game security is facing new challenges
LeetCode:剑指 Offer 03. 数组中重复的数字