当前位置:网站首页>Multivariate cluster analysis
Multivariate cluster analysis
2022-07-06 09:04:00 【Also far away】
One 、 Code
import pandas as pd
from pandas import DataFrame
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
# Read the file
datafile = u'student-mat.xlsx' # File location ,u To prevent the path from having Chinese names , There is no , It can be omitted
outfile = 'stu.xlsx'
data = pd.read_excel(datafile) # datafile yes excel file , So use read_excel, If it is csv For documents read_csv
d = DataFrame(data)
# clustering
n = 5 # Coalescence 5 Class data
mod = KMeans(n_clusters=n)
mod.fit_predict(d) # y_pred Represents the result of clustering
# Coalescence 5 Class data , Count the amount of data under each cluster , And find their center
r1 = pd.Series(mod.labels_).value_counts() # How many samples are there under each class
r2 = pd.DataFrame(mod.cluster_centers_) # center
r = pd.concat([r2, r1], axis=1)
r.columns = list(d.columns) + [u' Number of categories ']
# Mark each piece of data with which category it is divided
r = pd.concat([d, pd.Series(mod.labels_, index=d.index)], axis=1)
r.columns = list(d.columns) + [u' Clustering categories ']
print(r)
r.to_excel(outfile) # If you need to save to local , Just write this column
# Visualization process
ts = TSNE()
ts.fit_transform(r)
ts = pd.DataFrame(ts.embedding_, index=r.index)
a = ts[r[u' Clustering categories '] == 0]
plt.plot(a[0], a[1], 'r.')
a = ts[r[u' Clustering categories '] == 1]
plt.plot(a[0], a[1], 'go')
a = ts[r[u' Clustering categories '] == 2]
plt.plot(a[0], a[1], 'g*')
a = ts[r[u' Clustering categories '] == 3]
plt.plot(a[0], a[1], 'b.')
a = ts[r[u' Clustering categories '] == 4]
plt.plot(a[0], a[1], 'b*')
plt.show()
Two 、 result


3、 ... and 、 Data sets
边栏推荐
- LeetCode:236. 二叉树的最近公共祖先
- 如何正确截取字符串(例:应用报错信息截取入库操作)
- postman之参数化详解
- LeetCode:387. The first unique character in the string
- LeetCode:836. Rectangle overlap
- SimCLR:NLP中的对比学习
- LeetCode41——First Missing Positive——hashing in place & swap
- Selenium+pytest automated test framework practice
- LeetCode:41. 缺失的第一个正数
- Pytest之收集用例规则与运行指定用例
猜你喜欢

甘肃旅游产品预订增四倍:“绿马”走红,甘肃博物馆周边民宿一房难求

Selenium+pytest automated test framework practice

Advanced Computer Network Review(5)——COPE

LeetCode:221. 最大正方形

项目连接数据库遇到的问题及解决

Post training quantification of bminf
![[oc foundation framework] - < copy object copy >](/img/62/c04eb2736c2184d8826271781ac7e3.png)
[oc foundation framework] - < copy object copy >

Intel Distiller工具包-量化实现2

KDD 2022 paper collection (under continuous update)

Guangzhou will promote the construction of a child friendly city, and will explore the establishment of a safe area 200 meters around the school
随机推荐
项目连接数据库遇到的问题及解决
LeetCode:剑指 Offer 48. 最长不含重复字符的子字符串
LeetCode:41. 缺失的第一个正数
BN folding and its quantification
Advanced Computer Network Review(5)——COPE
LeetCode:236. The nearest common ancestor of binary tree
LeetCode:221. Largest Square
[OC]-<UI入门>--常用控件-提示对话框 And 等待提示器(圈)
LeetCode:39. 组合总和
使用latex导出IEEE文献格式
[OC-Foundation框架]-<字符串And日期与时间>
[oc]- < getting started with UI> -- common controls - prompt dialog box and wait for the prompt (circle)
Advanced Computer Network Review(3)——BBR
Leetcode: Jianzhi offer 03 Duplicate numbers in array
【图的三大存储方式】只会用邻接矩阵就out了
LeetCode:剑指 Offer 42. 连续子数组的最大和
LeetCode:26. Remove duplicates from an ordered array
LeetCode:劍指 Offer 42. 連續子數組的最大和
LeetCode41——First Missing Positive——hashing in place & swap
[MySQL] limit implements paging