当前位置:网站首页>Multivariate cluster analysis
Multivariate cluster analysis
2022-07-06 09:04:00 【Also far away】
One 、 Code
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Read the file
datafile = u'student-mat.xlsx' # File location ,u To prevent the path from having Chinese names , There is no , It can be omitted
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile yes excel file , So use read_excel, If it is csv For documents read_csv
d = DataFrame(data)
# clustering
n = 5 # Coalescence 5 Class data
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred Represents the result of clustering
# Coalescence 5 Class data , Count the amount of data under each cluster , And find their center
r1 = pd.Series(mod.labels_).value_counts() # How many samples are there under each class
r2 = pd.DataFrame(mod.cluster_centers_) # center
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u' Number of categories ']
# Mark each piece of data with which category it is divided
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u' Clustering categories ']
print(r)
r.to_excel(outfile) # If you need to save to local , Just write this column
# Visualization process
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u' Clustering categories '] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u' Clustering categories '] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u' Clustering categories '] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u' Clustering categories '] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u' Clustering categories '] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
Two 、 result
3、 ... and 、 Data sets
边栏推荐
- 自定义卷积注意力算子的CUDA实现
- BN folding and its quantification
- Navicat premium create MySQL create stored procedure
- LeetCode:124. Maximum path sum in binary tree
- 在QWidget上实现窗口阻塞
- Leetcode: Jianzhi offer 03 Duplicate numbers in array
- LeetCode:34. Find the first and last positions of elements in a sorted array
- 多元聚类分析
- LeetCode:剑指 Offer 42. 连续子数组的最大和
- Export IEEE document format using latex
猜你喜欢
Nacos 的安装与服务的注册
[MySQL] limit implements paging
UML图记忆技巧
[OC-Foundation框架]-<字符串And日期与时间>
Detailed explanation of dynamic planning
Simple use of promise in uniapp
opencv+dlib实现给蒙娜丽莎“配”眼镜
Different data-driven code executes the same test scenario
Advanced Computer Network Review(4)——Congestion Control of MPTCP
项目连接数据库遇到的问题及解决
随机推荐
To effectively improve the quality of software products, find a third-party software evaluation organization
超高效!Swagger-Yapi的秘密
TP-LINK 企业路由器 PPTP 配置
LeetCode:387. 字符串中的第一个唯一字符
KDD 2022 paper collection (under continuous update)
力扣每日一题(二)
【嵌入式】使用JLINK RTT打印log
LeetCode:26. Remove duplicates from an ordered array
[OC]-<UI入门>--常用控件-提示对话框 And 等待提示器(圈)
LeetCode:394. 字符串解码
Pytest之收集用例规则与运行指定用例
LeetCode:498. 对角线遍历
Improved deep embedded clustering with local structure preservation (Idec)
Leetcode: Sword finger offer 48 The longest substring without repeated characters
CSP first week of question brushing
LeetCode:124. 二叉树中的最大路径和
vb. Net changes with the window, scales the size of the control and maintains its relative position
Advanced Computer Network Review(5)——COPE
Simple use of promise in uniapp
[OC foundation framework] - [set array]