当前位置:网站首页>Multivariate cluster analysis
Multivariate cluster analysis
2022-07-06 09:04:00 【Also far away】
One 、 Code
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Read the file
datafile = u'student-mat.xlsx' # File location ,u To prevent the path from having Chinese names , There is no , It can be omitted
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile yes excel file , So use read_excel, If it is csv For documents read_csv
d = DataFrame(data)
# clustering
n = 5 # Coalescence 5 Class data
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred Represents the result of clustering
# Coalescence 5 Class data , Count the amount of data under each cluster , And find their center
r1 = pd.Series(mod.labels_).value_counts() # How many samples are there under each class
r2 = pd.DataFrame(mod.cluster_centers_) # center
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u' Number of categories ']
# Mark each piece of data with which category it is divided
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u' Clustering categories ']
print(r)
r.to_excel(outfile) # If you need to save to local , Just write this column
# Visualization process
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u' Clustering categories '] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u' Clustering categories '] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u' Clustering categories '] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u' Clustering categories '] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u' Clustering categories '] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
Two 、 result
3、 ... and 、 Data sets
边栏推荐
- Compétences en mémoire des graphiques UML
- 【shell脚本】——归档文件脚本
- Leetcode: Sword Finger offer 42. Somme maximale des sous - tableaux consécutifs
- 多元聚类分析
- 使用latex导出IEEE文献格式
- IJCAI2022论文合集(持续更新中)
- [OC-Foundation框架]--<Copy对象复制>
- How to effectively conduct automated testing?
- Selenium+pytest automated test framework practice (Part 2)
- UML图记忆技巧
猜你喜欢
Once you change the test steps, write all the code. Why not try yaml to realize data-driven?
如何正确截取字符串(例:应用报错信息截取入库操作)
Compétences en mémoire des graphiques UML
【文本生成】论文合集推荐丨 斯坦福研究者引入时间控制方法 长文本生成更流畅
Selenium+pytest automated test framework practice
Detailed explanation of dynamic planning
opencv+dlib实现给蒙娜丽莎“配”眼镜
UML图记忆技巧
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
What is MySQL? What is the learning path of MySQL
随机推荐
Leetcode: Jianzhi offer 04 Search in two-dimensional array
Mongodb installation and basic operation
AcWing 2456. 记事本
Intel Distiller工具包-量化实现1
SimCLR:NLP中的对比学习
Advanced Computer Network Review(3)——BBR
LeetCode:剑指 Offer 03. 数组中重复的数字
Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
LeetCode:41. Missing first positive number
[embedded] cortex m4f DSP Library
After reading the programmer's story, I can't help covering my chest...
[OC foundation framework] - [set array]
Alibaba cloud server mining virus solution (practiced)
MySQL uninstallation and installation methods
ant-design的走马灯(Carousel)组件在TS(typescript)环境中调用prev以及next方法
opencv+dlib实现给蒙娜丽莎“配”眼镜
使用标签模板解决用户恶意输入的问题
【shell脚本】使用菜单命令构建在集群内创建文件夹的脚本
LeetCode:236. 二叉树的最近公共祖先
【嵌入式】Cortex M4F DSP库