当前位置:网站首页>多元聚类分析
多元聚类分析
2022-07-06 08:49:00 【亦是远方】
一、代码
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# 读取文件
datafile = u'student-mat.xlsx' # 文件所在位置,u为防止路径中有中文名称,此处没有,可以省略
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile是excel文件,所以用read_excel,如果是csv文件则用read_csv
d = DataFrame(data)
# 聚类
n = 5 # 聚成 5 类数据
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred表示聚类的结果
# 聚成 5 类数据,统计每个聚类下的数据量,并且求出他们的中心
r1 = pd.Series(mod.labels_).value_counts() # 每个类下面有多少个样本
r2 = pd.DataFrame(mod.cluster_centers_) # 中心
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u'类别数目']
# 给每一条数据标注上被分为哪一类
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u'聚类类别']
print(r)
r.to_excel(outfile) # 如果需要保存到本地,就写上这一列
# 可视化过程
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u'聚类类别'] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u'聚类类别'] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u'聚类类别'] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u'聚类类别'] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u'聚类类别'] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
二、结果
三、数据集
边栏推荐
- poi追加写EXCEL文件
- The problem and possible causes of the robot's instantaneous return to the origin of the world coordinate during rviz simulation
- Chrome浏览器的crash问题
- Leetcode: Sword finger offer 42 Maximum sum of continuous subarrays
- Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
- marathon-envs项目环境配置(强化学习模仿参考动作)
- JS pure function
- Using pkgbuild:: find in R language_ Rtools check whether rtools is available and use sys The which function checks whether make exists, installs it if not, and binds R and rtools with the writelines
- 随手记01
- LeetCode:剑指 Offer 48. 最长不含重复字符的子字符串
猜你喜欢
随机推荐
Fairguard game reinforcement: under the upsurge of game going to sea, game security is facing new challenges
win10系统中的截图,win+prtSc保存位置
What are the common processes of software stress testing? Professional software test reports issued by companies to share
Roguelike game into crack the hardest hit areas, how to break the bureau?
Using pkgbuild:: find in R language_ Rtools check whether rtools is available and use sys The which function checks whether make exists, installs it if not, and binds R and rtools with the writelines
Nacos 的安装与服务的注册
Niuke winter vacation training 6 maze 2
pytorch查看张量占用内存大小
LeetCode:387. 字符串中的第一个唯一字符
Generator parameters incoming parameters
Process of obtaining the electronic version of academic qualifications of xuexin.com
电脑F1-F12用途
软件压力测试常见流程有哪些?专业出具软件测试报告公司分享
gcc动态库fPIC和fpic编译选项差异介绍
R language ggplot2 visualization: place the title of the visualization image in the upper left corner of the image (customize Title position in top left of ggplot2 graph)
How to effectively conduct automated testing?
vb.net 随窗口改变,缩放控件大小以及保持相对位置
The mysqlbinlog command uses
LeetCode:236. 二叉树的最近公共祖先
Deep anatomy of C language -- C language keywords