当前位置:网站首页>Multivariate cluster analysis
Multivariate cluster analysis
2022-07-06 09:04:00 【Also far away】
One 、 Code
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Read the file
datafile = u'student-mat.xlsx' # File location ,u To prevent the path from having Chinese names , There is no , It can be omitted
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile yes excel file , So use read_excel, If it is csv For documents read_csv
d = DataFrame(data)
# clustering
n = 5 # Coalescence 5 Class data
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred Represents the result of clustering
# Coalescence 5 Class data , Count the amount of data under each cluster , And find their center
r1 = pd.Series(mod.labels_).value_counts() # How many samples are there under each class
r2 = pd.DataFrame(mod.cluster_centers_) # center
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u' Number of categories ']
# Mark each piece of data with which category it is divided
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u' Clustering categories ']
print(r)
r.to_excel(outfile) # If you need to save to local , Just write this column
# Visualization process
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u' Clustering categories '] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u' Clustering categories '] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u' Clustering categories '] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u' Clustering categories '] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u' Clustering categories '] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
Two 、 result
3、 ... and 、 Data sets
边栏推荐
- LeetCode:34. Find the first and last positions of elements in a sorted array
- A convolution substitution of attention mechanism
- Mise en œuvre de la quantification post - formation du bminf
- TP-LINK enterprise router PPTP configuration
- postman之参数化详解
- UML图记忆技巧
- 使用latex导出IEEE文献格式
- 在QWidget上实现窗口阻塞
- 一改测试步骤代码就全写 为什么不试试用 Yaml实现数据驱动?
- LeetCode:673. 最长递增子序列的个数
猜你喜欢
LeetCode:124. 二叉树中的最大路径和
An article takes you to understand the working principle of selenium in detail
[oc foundation framework] - < copy object copy >
UML圖記憶技巧
Variable length parameter
LeetCode:221. Largest Square
LeetCode:221. 最大正方形
LeetCode:498. Diagonal traversal
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
requests的深入刨析及封装调用
随机推荐
The ECU of 21 Audi q5l 45tfsi brushes is upgraded to master special adjustment, and the horsepower is safely and stably increased to 305 horsepower
opencv+dlib实现给蒙娜丽莎“配”眼镜
UML圖記憶技巧
Intel distiller Toolkit - Quantitative implementation 2
BN折叠及其量化
【剑指offer】序列化二叉树
Improved deep embedded clustering with local structure preservation (Idec)
[OC-Foundation框架]---【集合数组】
LeetCode:673. Number of longest increasing subsequences
Advanced Computer Network Review(4)——Congestion Control of MPTCP
Unsupported operation exception
LeetCode:387. 字符串中的第一个唯一字符
Computer graduation design PHP Zhiduo online learning platform
Detailed explanation of dynamic planning
LeetCode:221. Largest Square
KDD 2022论文合集(持续更新中)
Chapter 1 :Application of Artificial intelligence in Drug Design:Opportunity and Challenges
Advanced Computer Network Review(3)——BBR
UML图记忆技巧
[MySQL] multi table query