当前位置:网站首页>K-means introduction

K-means introduction

2022-06-21 10:32:00 I'm afraid I'm not retarded

K-Mean

K-Means namely K Mean clustering , It belongs to partition clustering .

working principle :

According to the initialized cluster center information , Calculate the distance from each sample to these centers , It can be judged that each sample belongs to a class cluster , Update cluster center information , Recalculate the distance from each sample to the new cluster center , Re divide the samples into the corresponding classes of the new cluster center , Repeat , Until the termination conditions are met .

Yes N A sample points , Use K-Means The steps of clustering them :

  1. Determine the number of clusters k, And designate k The center of a cluster C1,C2…Ck;
  2. Calculate each sample Si Point to k Distance between centers , And classify the store as the nearest Cj Class , among i∈(1,N), j∈(1,k);
  3. Recalculate k The central point of a class cluster , Update the location of the original center point C1,C2…Ck
  4. Repeat step 2、3, Until the position of the center point does not change or the change amplitude is less than the agreed threshold , Or the predefined maximum number of cycles of the large lead , end . Get the final clustering result .

Implementation steps

First step , Determine the number of clusters , Determine the cluster center , Determine the distance calculation formula

  • Observation
  • Enumeration
  • Other technical means

Determine the distance formula : Common Euclidean distance calculation

The second step , Calculate the distance between each point and the cluster center , classified ;

The third step , Calculate the current cluster center , Update cluster center Ck The location of ;

Repeat step 2 , Transfer each sample Si Click on the new cluster center Ck Make a new division ;

Repeat step three , Calculate the cluster center according to the latest cluster , Update Center Ck Value ;

Repeat step 2 , The third step , Know that the position of the cluster center is not changing , Or the number of cycles is greater than the preset threshold , end . Get the final clustering result

Implement pseudo code

 choice k A point is used as the center of the initial cluster 

repeat

	 Assign each sample point to the nearest cluster center , formation k Class clusters 
     Recalculate the center of each class cluster 
until  Class clusters do not change  or  Maximum number of iterations reached 


k-means Advantages and disadvantages

  • advantage
    • The principle of simple , Easy to understand , Easy to implement
    • The clustering results are easy to interpret
    • The clustering results are relatively good
  • shortcoming
    • Number of categories k It needs to be specified in advance , And designated k Values are different , The clustering results are quite different
    • First time k Cluster centers have an impact on the final result , Different choices , The results may be different
    • Only spherical clusters can be recognized , Non spherical clustering results are poor
    • When there are many sample points , The amount of calculation is large
    • Sensitive to outliers , Discrete values require special treatment
原网站

版权声明
本文为[I'm afraid I'm not retarded]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/02/202202221440357293.html