当前位置:网站首页>sklearn clustering聚类
sklearn clustering聚类
2022-07-26 15:12:00 【qq_27390023】
无标签数据的聚类可以通过sklearn.cluster模块进行。每个聚类算法都有两个变体:一个类,实现在训练数据上学习聚类的拟合方法;一个函数,给定训练数据,返回对应于不同聚类的整数标签阵列。对于类,训练数据上的标签可以在labels_属性中找到。

### 1. KMeans
from sklearn.cluster import KMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[10, 2], [10, 4], [10, 0]])
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
print(kmeans.labels_)
print(kmeans.predict([[0, 0], [12, 3]]))
print(kmeans.cluster_centers_)
### 2.MiniBatchKMeans
# MiniBatchKMeans是KMeans算法的一个变种,它使用迷你批次来减少计算时间,
# 同时仍然试图优化相同的目标函数。
from sklearn.cluster import MiniBatchKMeans
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 0], [4, 4],
[4, 5], [0, 1], [2, 2],
[3, 2], [5, 5], [1, -1]])
# manually fit on batches
kmeans = MiniBatchKMeans(n_clusters=2,
random_state=0,
batch_size=6)
kmeans = kmeans.partial_fit(X[0:6,:])
kmeans = kmeans.partial_fit(X[6:12,:])
print(kmeans.cluster_centers_)
print(kmeans.predict([[0, 0], [4, 4]]))
# fit on the whole data
kmeans = MiniBatchKMeans(n_clusters=2,
random_state=0,
batch_size=6,
max_iter=10).fit(X)
print(kmeans.cluster_centers_)
print(kmeans.predict([[0, 0], [4, 4]]))
### 3.AffinityPropagation
from sklearn.cluster import AffinityPropagation
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
clustering = AffinityPropagation(random_state=5).fit(X)
print(clustering)
print(clustering.labels_)
print(clustering.predict([[0, 0], [4, 4]]))
print(clustering.cluster_centers_)
### 4.MeanShift
from sklearn.cluster import MeanShift
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
[4, 7], [3, 5], [3, 6]])
clustering = MeanShift(bandwidth=2).fit(X)
print(clustering.labels_)
print(clustering.predict([[0, 0], [5, 5]]))
print(clustering)
### 5.SpectralClustering
from sklearn.cluster import SpectralClustering
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
[4, 7], [3, 5], [3, 6]])
clustering = SpectralClustering(n_clusters=2,
assign_labels='discretize',
random_state=0).fit(X)
print(clustering.labels_)
print(clustering)
### 6.AgglomerativeClustering
from sklearn.cluster import AgglomerativeClustering
import numpy as np
X = np.array([[1, 2], [1, 4], [1, 0],
[4, 2], [4, 4], [4, 0]])
clustering = AgglomerativeClustering().fit(X)
print(clustering)
print(clustering.labels_)
### 7. DBSCAN
from sklearn.cluster import DBSCAN
import numpy as np
X = np.array([[1, 2], [2, 2], [2, 3],
[8, 7], [8, 8], [25, 80]])
clustering = DBSCAN(eps=3, min_samples=2).fit(X)
print(clustering)
print(clustering.labels_)
### 8.聚类评价
from sklearn import metrics
labels_true = [0, 0, 0, 1, 1, 1] # 真实分类标签
labels_pred = [0, 0, 1, 1, 2, 2] # 聚类结果
print(metrics.rand_score(labels_true, labels_pred))
print(metrics.adjusted_rand_score(labels_true, labels_pred))
print(metrics.adjusted_mutual_info_score(labels_true, labels_pred))
print(metrics.homogeneity_score(labels_true, labels_pred))
print(metrics.completeness_score(labels_true, labels_pred))
print(metrics.v_measure_score(labels_true, labels_pred, beta=0.6))
# 轮廓系数:分数介于-1和+1之间,前者表示不正确的聚类,后者表示高度密集的聚类。分数在0左右表示有重叠的聚类。
print(metrics.silhouette_score(X, labels_pred, metric='euclidean'))参考:
https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation
边栏推荐
- 谷歌尝试为ChromeOS引入密码强度指示器以提升线上安全性
- DevSecOps,让速度和安全兼顾
- 软测(七)性能测试(1)简要介绍
- Unity URP入门实战
- The R language uses the histogram function in the lattice package to visualize the histogram (histogram plot), the col parameter to customize the fill color, and the type parameter to customize the hi
- If food manufacturing enterprises want to realize intelligent and collaborative supplier management, it is enough to choose SRM supplier system
- Prometheus adds redis and MySQL node monitoring
- FOC电机控制基础
- Yifang biological fell 16% on the first day of listing: the company's market value was 8.8 billion, and Hillhouse and Lilly were shareholders
- 数商云:引领化工业态数字升级,看摩贝如何快速打通全场景互融互通
猜你喜欢

DevSecOps,让速度和安全兼顾

04 callable and common auxiliary classes

Environment regulation system based on Internet of things (esp32-c3+onenet+ wechat applet)

Strengthen the defense line of ecological security, and carry out emergency drills for environmental emergencies in Guangzhou

Creation and traversal of binary tree

The civil construction of the whole line of Guangzhou Metro Line 13 phase II has been completed by 53%, and it is expected to open next year

Jintuo shares listed on the Shanghai Stock Exchange: the market value of 2.6 billion Zhang Dong family business has a strong color
![[leetcode daily question] - 121. The best time to buy and sell stocks](/img/51/ae7c4d903a51d97b70d5e69c6fffaa.png)
[leetcode daily question] - 121. The best time to buy and sell stocks

Practical task scheduling platform (scheduled task)

How to search literature on nature?
随机推荐
What is the transport layer protocol tcp/udp???
How to search literature on nature?
Cs224w (Figure machine learning) 2021 winter course learning notes 5
anaconda No module named ‘cv2‘
QT is the most basic layout, creating a window interface
Operation method of abbkine elikine human alpha fetoprotein (AFP) ELISA quantitative Kit
2023 catering industry exhibition, China catering supply chain exhibition and Jiangxi catering Ingredients Exhibition were held in February
Pytorch installation CUDA corresponding
装备制造业的变革时代,SCM供应链管理系统如何赋能装备制造企业转型升级
Devsecops, speed and security
Detailed explanation of nat/napt address translation (internal and external network communication) technology [Huawei ENSP]
R language Visual scatter diagram, geom using ggrep package_ text_ The rep function avoids overlapping labels between data points (set the min.segment.length parameter to 0 to add line segments to the
Write a summary, want to use a reliable software to sort out documents, is there any recommendation?
Pytorch--- advanced chapter (function usage skills / precautions)
R language ggplot2 visualization: visual line graph, visual line graph for different groups using the group parameter in AES function
李宏毅《机器学习》丨3. Gradient Descent(梯度下降)
2023餐饮业展,中国餐饮供应链展,江西餐饮食材展2月举办
Soft test (VII) performance test (1) brief introduction
R language ggplot2 visualization: use the ggballoonplot function of ggpubr package to visualize the balloon graph (visualize the contingency table composed of two classification variables), and config
Prometheus adds email alarm and enterprise wechat robot alarm