当前位置:网站首页>简述聚类分析
简述聚类分析
2022-06-24 12:33:00 【Foneone】
聚类分析指标
聚类分析主要利用簇来进行评价。要求:簇内相似性高,簇间相似性低。
两类评价指标:(1)外部指标:与参考模型对比 (2)内部指标:直接考察聚类结果
外部指标,与参考模型对比:
(1)Jaccard 系数(Jaccard Coefficient ,JC系数)
(2)FM指数(Fowlkes and Mallows Index,FMI)
(3)Rand系数(Rand Index ,RI)
(4)ARI指数(Adjusted Rand Index,ARI)
使用RI时有个问题,就是对于随机聚类,RI不保证接近0(可能还很大)。而ARI指数就可以利用随机聚类情况下的RI即E[RI]来解决这个问题。
外部指标性能度量的结果都在[0,1]之间,这些值越大,说明聚类的性能越好。
内部指标,直接考察聚类效果
(1)DB指数(Davies-Bouldin ,DBI):给定两个簇,每个簇样本之间平均值之和比上两个簇的中心点之间的距离作为度量。
(2)Dunn指数(Dunn Index,DI):任意两个簇之间最近的距离的最小值。
DBI越小越好,DI越大越好。
F值:也就是常见的F1分数。当
=1时,就是标准的F-1分数。此外准确率(Accuracy)和召回率(Recall)也是评价手段。
熵(Entropy):查看类别所属于的概率,利用的就是 熵公式。值越小不确定性越低,聚类效果越好。
纯度(Purity): 纯度越高越好,聚类效果越好。纯度和熵都是从概率的角度出发的。参考链接2.
NMI (归一化互信息)和 MI (互信息)也是评价指标。
轮廓系数也是一种评价方法,结合内聚度和分离度两种因素来评价效果。
共性分类相关系数:是一种用于层次聚类效果的评价方法。
参考:链接1中有个表总结的特别全面。
聚类分析的类别
(1)原型聚类:首先给出一组原型刻画(原型就是假设已经有了每个簇的样本中心点)
K-means、二分 K-means、LVQ(学习向量化)【假设数据样本带有类别标记】这三种都是原型向量刻画。
高斯混合聚类采用概率模型刻画。
(2)密度聚类:通过样本分布的紧密程度确定。(查看样本点周围的点与该点对 紧密程度)
DBSCAN基于“邻域”刻画,OPTICS;DEBCLUE;AGNES。
(3)层次聚类:在不同层次对数据集进行刻画,从而形成树形的聚类结构
AGNES:自底向上;DIANA:自顶向下;BIRCH;CLARANS CHAMELEON (书中没有)
边栏推荐
- A hero's note stirred up a thousand waves across 10 countries, and the first-line big factories sent people here- Gwei 2022 Singapore
- OpenGL es shared context for multi-threaded rendering
- 2021-06-02: given the head node of a search binary tree, it will be transformed into an ordered two-way linked list with head and tail connected.
- MySQL 外键影响
- Babbitt | metauniverse daily must read: 618 scores have been announced. How much contribution has the digital collection made behind this satisfactory answer
- Clickhouse uses distributed join of pose series
- 一纸英雄帖,激起千层浪,横跨10国,一线大厂都派人来了!-GWEI 2022-新加坡
- MySQL foreign key impact
- Istio FAQ: istio init crash
- Making daily menu applet with micro build low code
猜你喜欢

钉钉、飞书、企业微信:迥异的商业门道

MySQL foreign key impact

【2022国赛模拟】摆(bigben)——行列式、杜教筛

解析nc格式文件,GRB格式文件的依赖包edu.ucar.netcdfAll的api 学习
Deep parsing and implementation of redis pub/sub publish subscribe mode message queue

一纸英雄帖,激起千层浪,横跨10国,一线大厂都派人来了!-GWEI 2022-新加坡

一文讲透植物内生菌研究怎么做 | 微生物专题

Mlife forum | microbiome and data mining

使用开源工具 k8tz 优雅设置 Kubernetes Pod 时区

A hero's note stirred up a thousand waves across 10 countries, and the first-line big factories sent people here- Gwei 2022 Singapore
随机推荐
Pinduoduo press the user accelerator key
Jenkins pipeline syntax
A scheme for crawlers to collect public opinion data
Making daily menu applet with micro build low code
Adjustment method of easynvr video platform equipment channel page display error
A good habit that makes your programming ability soar
[programming navigation] the practical code summarized by foreign great God, learned in 30 seconds!
Collation of related papers on root cause analysis
Deep parsing and implementation of redis pub/sub publish subscribe mode message queue
Codereview tool chain for micro medicine
How to evaluate software development projects reasonably?
How to solve the problem that MBR does not support partitions over 2T, and lossless transfer to GPT
Programmer: after 5 years in a company with comfortable environment, do you want to continue to cook frogs in warm water or change jobs?
[log service CLS] Tencent cloud log service CLS accesses CDN
Metamask项目方给Solidity程序员的16个安全建议
Opencv learning notes - Discrete Fourier transform
How does easygbs, a national standard platform, solve the problem that information cannot be carried across domains?
Babbitt | metauniverse daily must read: 618 scores have been announced. How much contribution has the digital collection made behind this satisfactory answer
mLife Forum | 微生物组和数据挖掘
About me, a 19 line programmer