当前位置：网站首页>[machine learning] principle and code of mean shift

[machine learning] principle and code of mean shift

2022-07-26 17:26:00 【Demeanor 78】

Mean Shift Introduce

Mean Shift ( Mean shift ) It is a density based nonparametric clustering algorithm , The algorithm idea is to assume that data sets of different clusters conform to different probability density distributions , Find the fastest direction in which the density of any sample point increases ( The meaning of the fastest direction is Mean Shift) , The region with high sample density corresponds to the maximum value of the distribution , These sample points will eventually converge at the local density maximum , And the point converging to the same local maximum is considered to be a member of the same cluster .

Mean Shift Principle

The purpose of mean shift clustering is to find a sample point with smooth density . It is a centroid Based Algorithm , Its working principle is to update the candidate points of the centroid to the average value of the points in a given region . Then filter these candidate points in the post-processing stage , To eliminate approximate duplicate points , Form the final set of centroids . Given a candidate centroid xi And the number of iterations t, Update according to the following equation :

among N(xi) Is in xi Neighborhood of samples within a given distance around ,m Is the average displacement vector calculated for each centroid of the region with the maximum growth of the pointing point density . Use the following formula to calculate , It can effectively update a centroid to the average value of samples in its neighborhood :

Mean Shift The flow of the algorithm can be understood as ：

Calculate the average displacement of each sample
Translate each sample point
repeat （1）（2）, Until the sample converges
Samples that converge to the same point can be considered as members of the same cluster
## Mean Shift Advantages and disadvantages of the algorithm
You don't need to set the number of clusters, but you can also deal with clusters of any shape , At the same time, the algorithm requires less parameters , And the result is relatively stable and does not need to be like K-means Sample initialization for . But at the same time Mean Shift For a large feature space, the amount of computation is very large , And if the parameter setting is not good, it will greatly affect the result , If bandwidth The setting is too small and the convergence is too slow , And if the bandwidth The parameter setting is too large , Some clusters will be lost .

Mean Shift Code implementation of

stay Sklearn Implemented in the MeanShift Algorithm , The algorithm is used as follows ：

sklearn.cluster.MeanShift(*, bandwidth=None, seeds=None, bin_seeding=False, min_bin_freq=1, cluster_all=True, n_jobs=None, max_iter=300)

The most important parameter is bandwidth, This parameter is used for RBF kernel Bandwidth in . Parameters seeds Is the seed used to initialize the core , If not specified, then sklearn.cluster.estimate_bandwidth Estimate .
Examples of use ：

from sklearn.cluster import MeanShift  
import numpy as np  
X = np.array([[1, 1], [2, 1], [1, 0],  
              [4, 7], [3, 5], [3, 6]])  
clustering = MeanShift(bandwidth=2).fit(X)

Mean Shift Application

#  Import related modules and import data sets 
import numpy as np
from sklearn.cluster import MeanShift, estimate_bandwidth
from sklearn.datasets import make_blobs
#  Generate sample data 
centers = [[1, 1], [-1, -1], [1, -1]]
X, _ = make_blobs(n_samples=10000, centers=centers, cluster_std=0.6)
es_bandwidth = estimate_bandwidth(X,quantile=0.2, n_samples= 500)
'''
estimate_bandwidth() Used to generate mean-shift Window size ,
 The meaning of the parameter is ： from X Randomly choose 500 Samples ,
 Calculate the distance between each pair of samples , Then select the of these distances 0.2 Quantile as return value 
'''
MS = MeanShift(bandwidth=es_bandwidth)
MS.fit(X)
labels = MS.labels_
cluster_centers = MS.cluster_centers_
uni_labels = np.unique(labels)
n_clusters_ = len(uni_labels)
import matplotlib.pyplot as plt
from itertools import cycle
#  Visualize the clustering results of the algorithm 
colors = cycle('bgrcmykbgrcmykbgrcmykbgrcmyk')
for k, col in zip(range(n_clusters_), colors):
    my_members = labels == k
    cluster_center = cluster_centers[k]
    plt.plot(X[my_members, 0], X[my_members, 1], col + '.')
    plt.plot(cluster_center[0], cluster_center[1], 'o', markerfacecolor=col,
             markeredgecolor='k', markersize=14)
plt.show()

Mean Shift Practical application of

Mean Shift It is a common algorithm in clustering , The following shows some applications of this algorithm in practice ：

1. Simple clustering

mean shift Clustering is somewhat similar to density clustering , Starting from a single sample point , Find the corresponding local maximum of probability density , And assign it to the corresponding maximum , So as to complete the process of clustering

2. Image segmentation

The essence of image segmentation is clustering , But relative and simple clustering , Image segmentation has its particularity .mean shift By clustering the pixel space , Achieve the purpose of image segmentation .

3. Image smoothing

Image smoothing and image segmentation are similar , It is also to find the corresponding maximum probability density point for each pixel , The main difference is that ：
a. The iterative process does not need to go deep , Usually one iteration is enough ;
b. After finding the maximum point of probability density , Directly use its color features to cover its own color features .

4. Contour extraction

Again , Contour extraction and image segmentation are similar , Or specifically , Contour extraction can be based on image segmentation . use first mean shift Algorithm for image segmentation , Then take the edges of different areas to get a simple outline

- EOF -

 Past highlights 




 It is suitable for beginners to download the route and materials of artificial intelligence ( Image & Text + video ) Introduction to machine learning series download machine learning and deep learning notes and other information printing 《 Statistical learning method 》 Code reproduction album machine learning communication qq Group 955171419, Please scan the code to join wechat group

原网站

版权声明
本文为[Demeanor 78]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/207/202207261643397428.html