当前位置:网站首页>Shandong University machine learning experiment VI k-means
Shandong University machine learning experiment VI k-means
2022-06-11 06:08:00 【Tcoder-l3est】
Machine learning experiment of Shandong University 6 The report
Experimental hours : 4 Date of experiment :2021.11.29
List of articles
Experimental topics :Experiment 6 : K-Means
The experiment purpose
In this exercise, you will use K-means to compress an image by reducing the
number of colors it containsUse K-means Compress the image by reducing the number of colors
Experimental environment
Software environment
Windows 10 + Matlab R2020a
CPU: Intel Core i5-8300H CPU @ 2.30GHz 2.30 GHz
Experimental steps and contents
understand K-Means
K-Means (Lloyd 1957)
Update for each point Class K: By calculating each data point X i X_i Xi To Center point of each category μ k \mu_k μk Distance of , To choose which category a point belongs to

Update each cluster means : That is, according to the current point of this category , Carry out the search mean operation , Come to the center

When μ k \mu_k μk perhaps Loss Constant time , Think the iteration is complete .
Loss:

Loss Function
μ k \mu_k μk Think it is the K The centroid of a cluster , z i , k z_{i,k} zi,k Think it's x i x_i xi Whether to belong to C k C_k Ck An indication of , Then for each piece of data, there will be a z i z_i zi, So define a data x i x_i xi Of l o s s loss loss by :

The total L o s s F u n c t i o n : Loss Function: LossFunction: among X i s N × D X \, is \, N \times D\, XisN×D then Z i s N × K And μ i s K × D Z\, is \, N \times K And \mu \, is \, K \times D ZisN×K And μisK×D

It means X Express all D D data ,Z Express all N The cluster category of data indicates , μ \mu μ Express K A cluster of D Dimensional center point .K-Means Is to minimize this Loss Function
K-Means Objective
K-Means Is a heuristic method , To solve this NP-hard problem , And he is a Non-Convex problem, There is a lot of local minima
The algorithm is described as :

Choosing K
Choosing the number of clusters is also a problem .
You can try different K value , Painting for different K Under the Loss chart , choice elbow point that will do .
You can also use AIC To solve the calculation
Task Description:
Your task is to compute 16 cluster centroids from this image, with each centroid being a vector of length three that holds a set of RGB values.
Calculation 16 Clusters , Every central point is a s i z e = 3 × 1 size = 3 \times 1 size=3×1 Of RGB p i x e l pixel pixel Pixel point v a l u e value value Express RGB Value
Whereas calculation 538 × 538 × 3 538 \times 538 \times 3 538×538×3 The center point of the picture will take a lot of time , So first Little picture ( 128 × 128 × 3 ) (128\times128\times3) (128×128×3) After training , Apply to large pictures , Get one that only uses 16 individual color It means the new picture of
Experimental steps :
K-Means Algorithm
Random initialization : From the whole picture Randomly choose 16 individual pixel As the initial iteration center point . The random number scheme I adopted can realize no repeated sampling .
function code:
%% Select the initial point randomly return K * 3 function sample_Datas = sample_random(num,datas,dimx,dimy) % datas For raw data num Is the target number sample_Datas = zeros(num,3); a = rand(dimx,1); b = rand(dimy,1); [vala,posa] = sort(a); [valb,posb] = sort(b); chose_x = posa(1:num,1); chose_y = posb(1:num,1); for i=1:num sample_Datas(i,:) = datas(chose_x(i),chose_y(i),:); end endCalculate each Pixel The nearest neighbor of : Traverse every pixel , Every pixel There is one. R G B v e c t o r RGB \, vector RGBvector also s i z e = 3 × 1 size = 3\times 1 size=3×1 And then Find the distance of all center points , Select the center point corresponding to the minimum distance K As this pixel Of class

function code:
%% Calculate each pixel Of Category return Clusters: N * N val by Category function Clusters = calculate_Ci(centroids,dimx,dimy,datas,K) Clusters = []; % Through each pixel Calculate a z(i,j) for i= 1:dimy for j = 1:dimx % obtain xi pixel_rgb = [datas(i,j,1),datas(i,j,2),datas(i,j,3)]; diff = ones(K,1)*pixel_rgb - centroids; distance = sum(diff.^2,2); [~,class] = min(distance); % Get the smallest corresponding category index Clusters(i,j) = class; end end endUpdate the center point μ \mu μ adopt

Function code:
%% Update the center point return 16 * 3
function newcenters = updatemeans(Clusters,dimx,dimy,dimz,datas,K)
newcenters = zeros(K,dimz);
nums = zeros(K);
sumred = zeros(K,1);
sumgreen = zeros(K,1);
sumblue = zeros(K,1);
for i=1:dimx
for j=1:dimy
class = Clusters(i,j);
nums(class) = nums(class) + 1;
sumred(class) = sumred(class) + datas(i,j,1);
sumgreen(class) = sumgreen(class) + datas(i,j,2);
sumblue(class) = sumblue(class) + datas(i,j,3);
end
end
for i=1:K
if nums(i) ~= 0
sumred(i) = sumred(i) / nums(i);
sumgreen(i) = sumgreen(i) / nums(i);
sumblue(i) = sumblue(i) / nums(i);
end
end
newcenters = [sumred,sumgreen,sumblue];
newcenters = round(newcenters);
end
Judge convergence :
I set the maximum number of convergence to 150 Time , Or the sum of the squares before and after updating according to all the center points if < 1 0 − 5 10^{-5} 10−5 I think it has not changed , Convergence .
Programming to achieve the above process , Finally get Corresponding to each pixel ** C l u s t e r s , s i z e = N × N Clusters,size = N \times N Clusters,size=N×N** as well as Of each central point RGB value : C e n t r o i d s , s i z e = K ∗ 3 Centroids,size = K*3 Centroids,size=K∗3
Reassigning Colors to The Large Image
ad locum , I have processed both small and large graphs respectively .
Little picture : For each pixel Turn into Corresponding class Of RGB value
Big picture : First calculate each pixel and The nearest point to the center point after training , Then replace RGB value , Get new pictures .
Of a random initial value effect :
Little picture :

Big picture : Higher resolution

Choosing A Good Initial Point
Choose a good starting point , To this end, I cycle through the above process , Set times . It is intended to choose a better starting point .
After each calculation of the last picture , Compare with the original drawing one by one pixel Of rgb Difference solution , Then on RGB Perform the square mean , To calculate the Loss, Get the lowest loss Pictures of the
As Best New Small / Large Image
The effect is as follows :
Little picture :

Big picture :

Compare with the original picture , The result is right :

At the same time, record the following Loss and Different times of random initial relations , Draw a chart :
It can be seen that : The overall effect is related to the initial of the initial point , Possible Local Mininum

Conclusion analysis and experience
K-Means Overall algorithm architecture
namely :
Fix the center point each time , Then update the category of each point
Or fix the category of each point , According to this, the position information of the center point is updated
Some Limitations Of K-Means
K-Means With applicable scope :
- Apply to every cluster It's about the same size , At this time, the effect will be good
- stay round-shaped The effect is good
At the same time non-convex shaped Bad performance
Kernel K-Means
That is, high-dimensional mapping , To solve that non-convex problem . While still not explicitly projecting , It's about using K K K
Hierarchical Clustering
Hierarchical clustering ,bottom-up、top-down
Experiment source code
Lab61.m
clear,clc
%% K-means
% small_img : 128 * 128 * 3
% large_img : 538 * 538 * 3
small_img = imread('bird_small.tiff');
large_img = imread('bird_large.tiff');
% Turn into double matrix A(50,33,3) Express y = 50 x =33 Of Blue Of val
small_img_matrix = double(small_img);
large_img_matrix = double(large_img);
%% show image
% imshow(large_img)
%%
K = 16; % 16 individual class
[small_dimx,small_dimy,small_dimz] = size(small_img_matrix);
[large_dimx,large_dimy,large_dimz] = size(large_img_matrix);
max_iter_times = 200;
convergence_condition = 10^-5;
[centroids,Clusters] = kmeans(K,small_img_matrix,max_iter_times,convergence_condition);
%% Redisplay the small picture
new_small_pic=[];
new_large_pic=[];
% Small picture processing
for i=1:small_dimx
for j=1:small_dimy
new_small_pic(i,j,:) = centroids(Clusters(i,j),:);
end
end
% Big picture processing
for i=1:large_dimy
for j=1:large_dimy
pixel_rgb = [large_img_matrix(i,j,1),large_img_matrix(i,j,2),large_img_matrix(i,j,3)];
diff = ones(K,1)*pixel_rgb - centroids;
distance = sum(diff.^2,2);
[~,class] = min(distance); % Get categories
new_large_pic(i,j,:) = centroids(class,:);
end
end
%% Show small pictures and Big picture
figure
subplot(2,1,1);
imshow(uint8(small_img_matrix)),title('Origin Small Image');
subplot(2,1,2);
imshow(uint8(new_small_pic)),title('New Small Image')
%%
figure
subplot(2,1,1);
imshow(uint8(large_img_matrix)),title('Origin Large Image');
subplot(2,1,2);
imshow(uint8(new_large_pic)),title('New Large Image')
%% K-Means Fuction Return ?
function [centroids,Clusters] = kmeans(K,datas,max_iter_times,convergence_condition)
% Return to the center point K * dimz
centroids = [];
[dimy,dimx,dimz] = size(datas);
% Initialize the center point randomly K*3
centroids = sample_random(K,datas,dimx,dimy);
for it = 1 : max_iter_times
% Calculation its nearest mean
Clusters = calculate_Ci(centroids,dimx,dimy,datas,K);
% Update the center point
new_centroids = updatemeans(Clusters,dimx,dimy,dimz,datas,K);
% See if it converges
convergence = judge(new_centroids,centroids,convergence_condition);
centroids = new_centroids;
if convergence
break
end
end
end
%% Select the initial point randomly return K * 3
function sample_Datas = sample_random(num,datas,dimx,dimy)
% datas For raw data num Is the target number
sample_Datas = zeros(num,3);
a = rand(dimx,1);
b = rand(dimy,1);
[vala,posa] = sort(a);
[valb,posb] = sort(b);
chose_x = posa(1:num,1);
chose_y = posb(1:num,1);
for i=1:num
sample_Datas(i,:) = datas(chose_x(i),chose_y(i),:);
end
end
%% Calculate each pixel Of Category return Clusters: N * N val by Category
function Clusters = calculate_Ci(centroids,dimx,dimy,datas,K)
Clusters = [];
% Through each pixel Calculate a z(i,j)
for i= 1:dimy
for j = 1:dimx
% obtain xi
pixel_rgb = [datas(i,j,1),datas(i,j,2),datas(i,j,3)];
diff = ones(K,1)*pixel_rgb - centroids;
distance = sum(diff.^2,2);
[~,class] = min(distance); % Get the smallest corresponding category index
Clusters(i,j) = class;
end
end
end
%% Update the center point return 16 * 3
function newcenters = updatemeans(Clusters,dimx,dimy,dimz,datas,K)
newcenters = zeros(K,dimz);
nums = zeros(K);
sumred = zeros(K,1);
sumgreen = zeros(K,1);
sumblue = zeros(K,1);
for i=1:dimx
for j=1:dimy
class = Clusters(i,j);
nums(class) = nums(class) + 1;
sumred(class) = sumred(class) + datas(i,j,1);
sumgreen(class) = sumgreen(class) + datas(i,j,2);
sumblue(class) = sumblue(class) + datas(i,j,3);
end
end
for i=1:K
if nums(i) ~= 0
sumred(i) = sumred(i) / nums(i);
sumgreen(i) = sumgreen(i) / nums(i);
sumblue(i) = sumblue(i) / nums(i);
end
end
newcenters = [sumred,sumgreen,sumblue];
newcenters = round(newcenters);
end
%% Judge whether convergence or not
function convergence = judge(newcenter,oldcenter,condition)
convergence = 0;
d = sum(sqrt(sum((newcenter - oldcenter).^2, 2)));
if d < condition
convergence = 1;
end
end
%% Return row and column coordinates
function [row,col] = findrc(Clusters,val)
[dimx,dimy] = size(Clusters);
row = [];
col = [];
for i=1:dimx
for j=1:dimy
if Clusters(i,j) == val
row = [row;i];
col = [col;j];
end
end
end
end
Lab62.m ( Select the best effect through multiple iterations )
clear,clc %% Do it a few more times , The aim is to have a different starting point , Find the best . %% K-means % small_img : 128 * 128 * 3 % large_img : 538 * 538 * 3 small_img = imread('bird_small.tiff'); large_img = imread('bird_large.tiff'); % Turn into double matrix A(50,33,3) Express y = 50 x =33 Of Blue Of val small_img_matrix = double(small_img); large_img_matrix = double(large_img); Best_new_small_pic=[]; Best_new_large_pic=[]; %% show image % imshow(large_img) %% inital_times = 10; for inital_time = 1 : inital_times K = 16; % 16 individual class [small_dimx,small_dimy,small_dimz] = size(small_img_matrix); [large_dimx,large_dimy,large_dimz] = size(large_img_matrix); max_iter_times = 200; convergence_condition = 10^-5; [centroids,Clusters] = kmeans(K,small_img_matrix,max_iter_times,convergence_condition); %% Redisplay the small picture loss_small = 1e9; loss_large = 1e9; new_small_pic=[]; new_large_pic=[]; % Small picture processing for i=1:small_dimx for j=1:small_dimy new_small_pic(i,j,:) = centroids(Clusters(i,j),:); end end new_loss_small = calculate_Kmeans_Loss(small_img_matrix,new_small_pic,small_dimx,small_dimy); if new_loss_small < loss_small loss_small = new_loss_small; Best_new_small_pic = new_small_pic; end % Big picture processing for i=1:large_dimy for j=1:large_dimx pixel_rgb = [large_img_matrix(i,j,1),large_img_matrix(i,j,2),large_img_matrix(i,j,3)]; diff = ones(K,1)*pixel_rgb - centroids; distance = sum(diff.^2,2); [~,class] = min(distance); % Get categories new_large_pic(i,j,:) = centroids(class,:); end end new_loss_large = calculate_Kmeans_Loss(large_img_matrix,new_large_pic,large_dimy,large_dimx); if new_loss_large < loss_large loss_large = new_loss_large; Best_new_large_pic = new_large_pic; end end %% Show small pictures and Big picture figure subplot(2,1,1); imshow(uint8(small_img_matrix)),title('Origin Small Image'); subplot(2,1,2); imshow(uint8(Best_new_small_pic)),title('Best New Small Image') imwrite(uint8(Best_new_small_pic),'Best New Small Image.png') figure subplot(2,1,1); imshow(uint8(large_img_matrix)),title('Origin Large Image'); subplot(2,1,2); imshow(uint8(Best_new_large_pic)),title('Best New Large Image') imwrite(uint8(Best_new_large_pic),'Best New Large Image.png') %% K-Means Fuction Return ? function [centroids,Clusters] = kmeans(K,datas,max_iter_times,convergence_condition) % Return to the center point K * dimz centroids = []; [dimy,dimx,dimz] = size(datas); % Initialize the center point randomly K*3 centroids = sample_random(K,datas,dimx,dimy); for it = 1 : max_iter_times % Calculation its nearest mean Clusters = calculate_Ci(centroids,dimx,dimy,datas,K); % Update the center point new_centroids = updatemeans(Clusters,dimx,dimy,dimz,datas,K); % See if it converges convergence = judge(new_centroids,centroids,convergence_condition); centroids = new_centroids; if convergence break end end end %% Select the initial point randomly return K * 3 function sample_Datas = sample_random(num,datas,dimx,dimy) % datas For raw data num Is the target number sample_Datas = zeros(num,3); a = rand(dimx,1); b = rand(dimy,1); [vala,posa] = sort(a); [valb,posb] = sort(b); chose_x = posa(1:num,1); chose_y = posb(1:num,1); for i=1:num sample_Datas(i,:) = datas(chose_x(i),chose_y(i),:); end end %% Calculate each pixel Of Category return Clusters: N * N val by Category function Clusters = calculate_Ci(centroids,dimx,dimy,datas,K) Clusters = []; % Through each pixel Calculate a z(i,j) for i= 1:dimy for j = 1:dimx % obtain xi pixel_rgb = [datas(i,j,1),datas(i,j,2),datas(i,j,3)]; diff = ones(K,1)*pixel_rgb - centroids; distance = sum(diff.^2,2); [~,class] = min(distance); % Get the smallest corresponding category index Clusters(i,j) = class; end end end %% Update the center point return 16 * 3 function newcenters = updatemeans(Clusters,dimx,dimy,dimz,datas,K) newcenters = zeros(K,dimz); nums = zeros(K); sumred = zeros(K,1); sumgreen = zeros(K,1); sumblue = zeros(K,1); for i=1:dimx for j=1:dimy class = Clusters(i,j); nums(class) = nums(class) + 1; sumred(class) = sumred(class) + datas(i,j,1); sumgreen(class) = sumgreen(class) + datas(i,j,2); sumblue(class) = sumblue(class) + datas(i,j,3); end end for i=1:K if nums(i) ~= 0 sumred(i) = sumred(i) / nums(i); sumgreen(i) = sumgreen(i) / nums(i); sumblue(i) = sumblue(i) / nums(i); end end newcenters = [sumred,sumgreen,sumblue]; newcenters = round(newcenters); end %% Judge whether convergence or not function convergence = judge(newcenter,oldcenter,condition) convergence = 0; d = sum(sqrt(sum((newcenter - oldcenter).^2, 2))); if d < condition convergence = 1; end end %% Calculation K-Means After the picture loss: Calculate all pixel Of rgb sum then avg function Loss1 = calculate_Kmeans_Loss(ori_img,new_img,dimx,dimy) Loss = zeros(3,1); div_num = dimx*dimy; for i=1:dimx for j = 1 :dimy Loss(1) = Loss(1) + (ori_img(i,j,1)-new_img(i,j,1)).^2 / div_num; Loss(2) = Loss(2) + (ori_img(i,j,2)-new_img(i,j,2)).^2 / div_num; Loss(3) = Loss(3) + (ori_img(i,j,3)-new_img(i,j,3)).^2 / div_num; end end Loss1 = sum(Loss) / 3; end
n
convergence = 1; endend
%% Calculation K-Means After the picture loss: Calculate all pixel Of rgb sum then avg
function Loss1 = calculate_Kmeans_Loss(ori_img,new_img,dimx,dimy)
Loss = zeros(3,1);
div_num = dimx*dimy;
for i=1:dimx
for j = 1 :dimy
Loss(1) = Loss(1) + (ori_img(i,j,1)-new_img(i,j,1)).^2 / div_num;
Loss(2) = Loss(2) + (ori_img(i,j,2)-new_img(i,j,2)).^2 / div_num;
Loss(3) = Loss(3) + (ori_img(i,j,3)-new_img(i,j,3)).^2 / div_num;
end
end
Loss1 = sum(Loss) / 3;
end
边栏推荐
- Use com youth. banner. Solution to the inflateexception reported by the banner plug-in
- Cocoatouch framework and building application interface
- Méthode de la partie du tableau
- Do we really need conference headphones?
- Invert an array with for
- Simple understanding of pseudo elements before and after
- Data quality: the core of data governance
- verilog实现双目摄像头图像数据采集并modelsim仿真,最终matlab进行图像显示
- Chapter 1 of machine learning [series] linear regression model
- Thymeleafengine template engine
猜你喜欢

Record the first data preprocessing process

"All in one" is a platform to solve all needs, and the era of operation and maintenance monitoring 3.0 has come

Login and registration based on servlet, JSP and MySQL

What should the cross-border e-commerce evaluation team do?

ThymeleafEngine模板引擎

FPGA面试题目笔记(一)——FPGA开发流程、亚稳态和竞争冒险、建立保持时间、异步FIFO深度等

Getting started with kotlin

View controller and navigation mode

A collection of problems on improving working frequency and reducing power consumption in FPGA design

All questions and answers of database SQL practice niuke.com
随机推荐
Yoyov5's tricks | [trick8] image sampling strategy -- Sampling by the weight of each category of the dataset
做亚马逊测评要了解的知识点有哪些?
What happened to the young man who loved to write code -- approaching the "Yao Guang young man" of Huawei cloud
Yonghong Bi product experience (I) data source module
View controller and navigation mode
FPGA面试题目笔记(四)—— 序列检测器、跨时钟域中的格雷码、乒乓操作、降低静动态损耗、定点化无损误差、恢复时间和移除时间
Fix [no Internet, security] problem
This point of arrow function
FPGA interview notes (III) -- implementation of handshake signal synchronization in cross clock domain, arbitrary frequency division, binary conversion, RAM memory, original code inversion and complem
The difference between call and apply and bind
Servlet
CCF 2013 12-4 interesting numbers
A collection of problems on improving working frequency and reducing power consumption in FPGA design
Chapter 1 of machine learning [series] linear regression model
Installing MySQL for Linux
Configure the rust compilation environment
Principle of copyonwritearraylist copy on write
Free get | full function version of version control software
Getting started with kotlin
MATLAB realizes mean filtering and FPGA for comparison, and uses Modelsim waveform simulation