当前位置:网站首页>[machine learning] principle and code of mean shift
[machine learning] principle and code of mean shift
2022-07-26 17:26:00 【Demeanor 78】
Mean Shift Introduce
Mean Shift ( Mean shift ) It is a density based nonparametric clustering algorithm , The algorithm idea is to assume that data sets of different clusters conform to different probability density distributions , Find the fastest direction in which the density of any sample point increases ( The meaning of the fastest direction is Mean Shift) , The region with high sample density corresponds to the maximum value of the distribution , These sample points will eventually converge at the local density maximum , And the point converging to the same local maximum is considered to be a member of the same cluster .
Mean Shift Principle
The purpose of mean shift clustering is to find a sample point with smooth density . It is a centroid Based Algorithm , Its working principle is to update the candidate points of the centroid to the average value of the points in a given region . Then filter these candidate points in the post-processing stage , To eliminate approximate duplicate points , Form the final set of centroids . Given a candidate centroid xi And the number of iterations t, Update according to the following equation :
among N(xi) Is in xi Neighborhood of samples within a given distance around ,m Is the average displacement vector calculated for each centroid of the region with the maximum growth of the pointing point density . Use the following formula to calculate , It can effectively update a centroid to the average value of samples in its neighborhood :
Mean Shift The flow of the algorithm can be understood as :
Calculate the average displacement of each sample
Translate each sample point
repeat (1)(2), Until the sample converges
Samples that converge to the same point can be considered as members of the same cluster
## Mean Shift Advantages and disadvantages of the algorithm
You don't need to set the number of clusters, but you can also deal with clusters of any shape , At the same time, the algorithm requires less parameters , And the result is relatively stable and does not need to be like K-means Sample initialization for . But at the same time Mean Shift For a large feature space, the amount of computation is very large , And if the parameter setting is not good, it will greatly affect the result , If bandwidth The setting is too small and the convergence is too slow , And if the bandwidth The parameter setting is too large , Some clusters will be lost .
Mean Shift Code implementation of
stay Sklearn Implemented in the MeanShift Algorithm , The algorithm is used as follows :
sklearn.cluster.MeanShift(*, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, n_jobs=None, max_iter=300) The most important parameter is bandwidth, This parameter is used for RBF kernel Bandwidth in . Parameters seeds Is the seed used to initialize the core , If not specified, then sklearn.cluster.estimate_bandwidth Estimate .
Examples of use :
from sklearn.cluster import MeanShift
import numpy as np
X = np.array([[1, 1], [2, 1], [1, 0],
[4, 7], [3, 5], [3, 6]])
clustering = MeanShift(bandwidth=2).fit(X)Mean Shift Application
# Import related modules and import data sets
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.6)
es_bandwidth = estimate_bandwidth(X,quantile=0.2, n_samples= 500)
'''
estimate_bandwidth() Used to generate mean-shift Window size ,
The meaning of the parameter is : from X Randomly choose 500 Samples ,
Calculate the distance between each pair of samples , Then select the of these distances 0.2 Quantile as return value
'''
MS = MeanShift(bandwidth=es_bandwidth)
MS.fit(X)
labels = MS.labels_
cluster_centers = MS.cluster_centers_
uni_labels = np.unique(labels)
n_clusters_ = len(uni_labels)
import matplotlib.pyplot as plt
from itertools import cycle
# Visualize the clustering results of the algorithm
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
my_members = labels == k
cluster_center = cluster_centers[k]
plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
markeredgecolor='k', markersize=14)
plt.show()
Mean Shift Practical application of
Mean Shift It is a common algorithm in clustering , The following shows some applications of this algorithm in practice :
1. Simple clustering
mean shift Clustering is somewhat similar to density clustering , Starting from a single sample point , Find the corresponding local maximum of probability density , And assign it to the corresponding maximum , So as to complete the process of clustering
2. Image segmentation
The essence of image segmentation is clustering , But relative and simple clustering , Image segmentation has its particularity .mean shift By clustering the pixel space , Achieve the purpose of image segmentation .

3. Image smoothing
Image smoothing and image segmentation are similar , It is also to find the corresponding maximum probability density point for each pixel , The main difference is that :
a. The iterative process does not need to go deep , Usually one iteration is enough ;
b. After finding the maximum point of probability density , Directly use its color features to cover its own color features .

4. Contour extraction
Again , Contour extraction and image segmentation are similar , Or specifically , Contour extraction can be based on image segmentation . use first mean shift Algorithm for image segmentation , Then take the edges of different areas to get a simple outline

- EOF -

Past highlights
It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download machine learning and deep learning notes and other information printing 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group 
边栏推荐
- 如何使用 align-regexp 对齐 userscript 元信息
- Using MySQL master-slave replication delay to save erroneously deleted data
- Focus on 5g and AI! Next year, zhanrui will promote 7Nm 5g chips and NPU chips!
- [visdrone data set] yolov7 training visdrone data set and results
- Stand aside with four and five rear cameras, LG or push the 16 rear camera mobile phone!
- In the first half of the year, sales increased by 10% against the trend. You can always trust Volvo, which is persistent and safe
- 【无标题】
- Pyqt5 rapid development and practice 3.4 signal and slot correlation
- SQL injection (mind map)
- 机器学习-什么是机器学习、监督学习和无监督学习
猜你喜欢

How does win11 automatically clean the recycle bin?

Operating system migration practice: deploying MySQL database on openeuler

Application of machine vision in service robot

Comparison between dimensional modeling and paradigm modeling

In depth exploration of ribbon load balancing

2019普及组总结
![[basic course of flight control development 2] crazy shell · open source formation UAV - timer (LED flight information light and indicator light flash)](/img/ad/e0bc488c238a260768f7e7faec87d0.png)
[basic course of flight control development 2] crazy shell · open source formation UAV - timer (LED flight information light and indicator light flash)

"Green is better than blue". Why is TPC the last white lotus to earn interest with money

Machine learning - what are machine learning, supervised learning, and unsupervised learning

Quickly build a development platform for enterprise applications
随机推荐
Review the past and know the new MySQL isolation level
【开发教程7】疯壳·开源蓝牙心率防水运动手环-电容触摸
浅谈数据技术人员的成长之路
【无标题】
Implement softmax classification from zero sum using mxnet
Relationship between standardization, normalization and regularization
Oracle is slow to perform a large number of DML operations. Is it the problem of CPU or hard disk?
Avalanche subnets vs. polygon supernets of application chain
Batch normalization batch_ normalization
Redis hotspot key and big value
(25)Blender源码分析之顶层菜单Blender菜单
[express receives get, post, and route request parameters]
[Luogu p8063] shortest paths (graph theory)
How does win11 automatically clean the recycle bin?
机器学习-什么是机器学习、监督学习和无监督学习
Stop using xshell and try this more modern terminal connection tool
Is the rolling update of pod similar to Canary deployment or blue-green deployment?
Concepts and differences of DQL, DML, DDL and DCL
Win11 how to close a shared folder
图解用户登录验证流程,写得太好了!