当前位置:网站首页>[machine learning] principle and code of mean shift
[machine learning] principle and code of mean shift
2022-07-26 17:26:00 【Demeanor 78】
Mean Shift Introduce
Mean Shift ( Mean shift ) It is a density based nonparametric clustering algorithm , The algorithm idea is to assume that data sets of different clusters conform to different probability density distributions , Find the fastest direction in which the density of any sample point increases ( The meaning of the fastest direction is Mean Shift) , The region with high sample density corresponds to the maximum value of the distribution , These sample points will eventually converge at the local density maximum , And the point converging to the same local maximum is considered to be a member of the same cluster .
Mean Shift Principle
The purpose of mean shift clustering is to find a sample point with smooth density . It is a centroid Based Algorithm , Its working principle is to update the candidate points of the centroid to the average value of the points in a given region . Then filter these candidate points in the post-processing stage , To eliminate approximate duplicate points , Form the final set of centroids . Given a candidate centroid xi And the number of iterations t, Update according to the following equation :
among N(xi) Is in xi Neighborhood of samples within a given distance around ,m Is the average displacement vector calculated for each centroid of the region with the maximum growth of the pointing point density . Use the following formula to calculate , It can effectively update a centroid to the average value of samples in its neighborhood :
Mean Shift The flow of the algorithm can be understood as :
Calculate the average displacement of each sample
Translate each sample point
repeat (1)(2), Until the sample converges
Samples that converge to the same point can be considered as members of the same cluster
## Mean Shift Advantages and disadvantages of the algorithm
You don't need to set the number of clusters, but you can also deal with clusters of any shape , At the same time, the algorithm requires less parameters , And the result is relatively stable and does not need to be like K-means Sample initialization for . But at the same time Mean Shift For a large feature space, the amount of computation is very large , And if the parameter setting is not good, it will greatly affect the result , If bandwidth The setting is too small and the convergence is too slow , And if the bandwidth The parameter setting is too large , Some clusters will be lost .
Mean Shift Code implementation of
stay Sklearn Implemented in the MeanShift Algorithm , The algorithm is used as follows :
sklearn.cluster.MeanShift(*, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, n_jobs=None, max_iter=300) The most important parameter is bandwidth, This parameter is used for RBF kernel Bandwidth in . Parameters seeds Is the seed used to initialize the core , If not specified, then sklearn.cluster.estimate_bandwidth Estimate .
Examples of use :
from sklearn.cluster import MeanShift
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
[4, 7], [3, 5], [3, 6]])
clustering = MeanShift(bandwidth=2).fit(X)Mean Shift Application
# Import related modules and import data sets
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.6)
es_bandwidth = estimate_bandwidth(X,quantile=0.2, n_samples= 500)
'''
estimate_bandwidth() Used to generate mean-shift Window size ,
The meaning of the parameter is : from X Randomly choose 500 Samples ,
Calculate the distance between each pair of samples , Then select the of these distances 0.2 Quantile as return value
'''
MS = MeanShift(bandwidth=es_bandwidth)
MS.fit(X)
labels = MS.labels_
cluster_centers = MS.cluster_centers_
uni_labels = np.unique(labels)
n_clusters_ = len(uni_labels)
import matplotlib.pyplot as plt
from itertools import cycle
# Visualize the clustering results of the algorithm
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.show()
Mean Shift Practical application of
Mean Shift It is a common algorithm in clustering , The following shows some applications of this algorithm in practice :
1. Simple clustering
mean shift Clustering is somewhat similar to density clustering , Starting from a single sample point , Find the corresponding local maximum of probability density , And assign it to the corresponding maximum , So as to complete the process of clustering
2. Image segmentation
The essence of image segmentation is clustering , But relative and simple clustering , Image segmentation has its particularity .mean shift By clustering the pixel space , Achieve the purpose of image segmentation .

3. Image smoothing
Image smoothing and image segmentation are similar , It is also to find the corresponding maximum probability density point for each pixel , The main difference is that :
a. The iterative process does not need to go deep , Usually one iteration is enough ;
b. After finding the maximum point of probability density , Directly use its color features to cover its own color features .

4. Contour extraction
Again , Contour extraction and image segmentation are similar , Or specifically , Contour extraction can be based on image segmentation . use first mean shift Algorithm for image segmentation , Then take the edges of different areas to get a simple outline

- EOF -

Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download machine learning and deep learning notes and other information printing 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group 
边栏推荐
- Environment setup mongodb
- #夏日挑战赛# OpenHarmony基于JS实现的贪吃蛇
- 2022 年有哪些流行的技术?
- API for sellers -- description of the return value of adding baby API to Taobao / tmall sellers' stores
- regular expression
- 机器视觉在服务机器人中的应用
- Redis persistence - detailed analysis of RDB source code | nanny level analysis! The most complete network
- 【Express接收Get、Post、路由请求参数】
- How to use different tools to analyze and optimize code performance when CPU utilization is high
- OA项目之我的会议(会议排座&送审)
猜你喜欢

Quickly build a development platform for enterprise applications

Anaconda download and Spyder error reporting solution
![[ctfshow-web]反序列化](/img/cd/b76e148adfc4d61049ab2cf429d4d7.png)
[ctfshow-web]反序列化

Realizing DDD based on ABP -- related concepts of DDD

The latest interface of Taobao / tmall keyword search

(24)Blender源码分析之顶层菜单显示代码分析

In May, 2022, video user insight: user use time increased, and the platform achieved initial results in cost reduction and efficiency increase

About the adjustment of the game background, reading this article is enough

【飞控开发基础教程2】疯壳·开源编队无人机-定时器(LED 航情灯、指示灯闪烁)

Tcpdump命令详解
随机推荐
How to ensure cache and database consistency
[ctfshow-web]反序列化
Stop using xshell and try this more modern terminal connection tool
In May, 2022, video user insight: user use time increased, and the platform achieved initial results in cost reduction and efficiency increase
Idea Alibaba cloud multi module deployment
Quickly learn to configure local and network sources of yum, and learn to use yum
Implementing DDD based on ABP -- aggregation and aggregation root practice
OpenWrt之feeds.conf.default详解
Implement softmax classification from zero sum using mxnet
The user experience center of Analysys Qianfan bank was established to help upgrade the user experience of the banking industry
regular expression
PXE efficient batch network installation
使用 Dired 快速移动文件
Take you a minute to learn about symmetric encryption and asymmetric encryption
Pack tricks
[ctfshow web] deserialization
Pytest(思维导图)
Methods of path related comments (I)
浅谈数据技术人员的成长之路
【飞控开发基础教程1】疯壳·开源编队无人机-GPIO(LED 航情灯、信号灯控制)