当前位置:网站首页>Multivariate cluster analysis
Multivariate cluster analysis
2022-07-06 09:04:00 【Also far away】
One 、 Code
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Read the file
datafile = u'student-mat.xlsx' # File location ,u To prevent the path from having Chinese names , There is no , It can be omitted
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile yes excel file , So use read_excel, If it is csv For documents read_csv
d = DataFrame(data)
# clustering
n = 5 # Coalescence 5 Class data
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred Represents the result of clustering
# Coalescence 5 Class data , Count the amount of data under each cluster , And find their center
r1 = pd.Series(mod.labels_).value_counts() # How many samples are there under each class
r2 = pd.DataFrame(mod.cluster_centers_) # center
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u' Number of categories ']
# Mark each piece of data with which category it is divided
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u' Clustering categories ']
print(r)
r.to_excel(outfile) # If you need to save to local , Just write this column
# Visualization process
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u' Clustering categories '] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u' Clustering categories '] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u' Clustering categories '] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u' Clustering categories '] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u' Clustering categories '] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
Two 、 result
3、 ... and 、 Data sets
边栏推荐
- 【shell脚本】使用菜单命令构建在集群内创建文件夹的脚本
- [text generation] recommended in the collection of papers - Stanford researchers introduce time control methods to make long text generation more smooth
- LeetCode:39. 组合总和
- LeetCode:394. 字符串解码
- Intel Distiller工具包-量化实现2
- [OC]-<UI入门>--常用控件-UIButton
- BMINF的后训练量化实现
- Advanced Computer Network Review(4)——Congestion Control of MPTCP
- [oc]- < getting started with UI> -- common controls uibutton
- LeetCode:劍指 Offer 42. 連續子數組的最大和
猜你喜欢
随机推荐
MySQL uninstallation and installation methods
Compétences en mémoire des graphiques UML
vb. Net changes with the window, scales the size of the control and maintains its relative position
Li Kou daily question 1 (2)
TDengine 社区问题双周精选 | 第三期
CUDA realizes focal_ loss
MongoDB 的安装和基本操作
LeetCode41——First Missing Positive——hashing in place & swap
Intel Distiller工具包-量化实现2
LeetCode:41. Missing first positive number
MYSQL卸载方法与安装方法
BMINF的后训练量化实现
Selenium+Pytest自动化测试框架实战
注意力机制的一种卷积替代方式
What is the role of automated testing frameworks? Shanghai professional third-party software testing company Amway
UML图记忆技巧
Intel distiller Toolkit - Quantitative implementation 1
LeetCode:673. 最长递增子序列的个数
Variable length parameter
In depth analysis and encapsulation call of requests